Big Data Day LA is on Aug 11, 2018 from 8 am – 8 pm at University of Southern California

Register / 2018 Registration

Sponsors / 2018 Sponsors

Email Subash D’Souza at sawjd@yahoo.com if you are interested in being a sponsor for our 2018 event.

2018 - University of Southern California USC
2018 - Agilisium2018 - BMC
2018 - Accenture2018 - Arcadia Data2018 - Couchbase2018 - DataStax2018 - Hortonworks2018 - MariaDB2018 - Neo4J2018 - PureStorage2018 - Qlik2018 - Streamlio2018 - Syncsort
2018 - WANDisco
2018 - BCG
2018 - Databricks2018 - Dremio2018 - GumGum2018 - Hulu2018 - MarkLogic2018 - Memsql2018 - Mesosphere2018 - Netflix2018 - Qubole2018 - Socialgist2018 - Tibco2018 - Verizon Digital Media Services2018 - WB
2018 - Archangel Technology Consultants, LLC2018 - GA2018 - ISSA2018 - LAAC2018 - Level2018 - LOPSA2018 - Los Angeles Big Data Users Group2018 - Meeting Application2018 - Niagara2018 - OpenSUSE2018 - Phase Two2018 - TenOneTen Ventures

Keynote Speakers / 2018

‘Subash D’Souza

Organizer at Big Data Day LA
Subash D'Souza is a Data Evangelist. He is the founder and organizer of Big Data Day LA, a data conference based in Sunny Southern California. He also organizes the Los Angeles Big Data Users Group and the Los Angeles Apache Spark Users Group. Subash's passions lies in building scalable and performant systems. 

Abbass Sharif, PhD.

Academic Director, Master of Science in Business Analytics Program at the University of Southern California
Abbass is a professor of data science at USC Marshall School of Business, and he is the director of the MS in Business Analytics program. Professor Sharif specializes is in the field of statistical computing and data visualization and he has developed and published new multivariate visualization techniques for functional data, and currently he is developing visualization techniques to study brain activity data collected via the near-infrared spectroscopy (NIRS) technology. Professor Sharif teaches statistics courses that range from introductory statistics to data analysis for decision-making through to advanced modern statistical learning techniques, statistical computing and data visualization. (Host)

Gwen Shapira

Principal Data Architect at Confluent
Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

Justin Herz

Executive Vice President Digital Product, Platform, & Strategy at Warner Bros. Entertainment Inc.
Justin Herz was named Executive Vice President, Digital Product, Platform, and Strategy in January 2017. As head of Warner Bros. Digital Herz oversees emerging technology, new platforms, and direct-to-consumer technology working across all divisions of the studio identifying and spearheading technology-related innovations intersecting with new digital products, production, distribution, and marketing. Digital’s portfolio includes digital innovation, technology development, standard setting for the studio, direct to consumer platforms, and Warner’s consumer intelligence/data activities designed to generate value by creating and enhancing consumer engagement through direct interaction with Warner Bros. products, programs and partners.

Ken Weiner

CTO at GumGum
Ken Weiner leads the engineering and product teams behind the industry’s top computer vision platform for marketers and rights holders. As an active member of the ad tech community, Weiner is a guest columnist for VentureBeat and Forbes, a frequent speaker at conferences and a member of industry groups such as IAB’s OpenRTB Working Group and various LA Ad Tech Meetups. Weiner was LA Business Journal’s 2015 CTO of the Year, andin 2003, he was included in the InfoWorld 100 for his work on uPortal, an open-source web portal framework for the higher education community. Before coming to GumGum, Weiner was the Director of Engineering at the lead-generation platform LowerMyBills.com, acquired by Experian in 2005. He has also worked as an open-source software consultant, a project manager and an instructor at Interactive Business Solutions, acquired by Unicon Inc. in 2003. Weiner received a B.S. from UCLA and currently lives in Los Angeles. Away from the office, he enjoys mountain biking, hiking, California native plants, jazz music and playing the saxophone.

Megan Risdal

Data Scientist & Datasets Product Lead at Kaggle/ Google
Megan Risdal is a Product Lead and Data Scientist at Kaggle, the world's largest global data science and machine learning community, where she focuses on helping the world learn from data using collaborative tools and open datasets. Prior to her current role, she worked in marketing and market research and has authored dozens of blog posts including interviews with top data scientists from Airbnb and Deepmind. She has master's degrees in linguistics from the University of California, Los Angeles and North Carolina State University.

Sari Ladin-Sienne

Chief Data Officer at City of Los Angeles
Sari Ladin-Sienne executes Mayor Garcetti’s vision for a more responsive, and data-driven Los Angeles. As Chief Data Officer, Sari improves access to meaningful information for residents and city employees. She leverages her policy knowledge and technical skillset to form new partnerships that tap into the value of data as a critical asset in smarter decision-making. Sari believes knowledge sharing is at the core of the smart cities movement. She previously worked as a Harvard Ash Fellow for the Mayor’s Data Team where she led the team’s digital strategy and elevated the city’s best practices to the national conversation through her articles for Harvard’s Data Smart Cities and Government Technology. She has provided data analysis, budgeting, and policy recommendations to a range of public and nonprofit clients including the Consumer Financial Protection Bureau and the Nonprofit Finance Fund. Sari holds a Master's Degree in Public Policy from the University of California, Berkeley.

Sean McClure, Ph.D.

Technical Senior Principal, Senior Manager - Artificial Intelligence at Accenture
Sean McClure works towards ensuring stakeholders know how to leverage machine learning to build products that matter.  A highly experienced Data Scientist, he has worked for a number of companies throughout Silicon Valley.  Sean's work spans hands-on contribution to enterprise-scale AI platforms, to directing entire data science teams.  He is called into projects that involve machine learning and AI to provide thought leadership and direction, as well as running hands-on workshops for Accenture clients at various stages of analytics maturity. (Sponsored)

Tim Eusterman

Senior Director, Solutions Marketing, Digital Business Automation Business Unit at BMC Software
Tim leads the development and execution of market strategy, positioning/messaging, demand and content marketing, and sales enablement. Tim’s career spans over 25 years of B2B high tech marketing in the supply chain and enterprise mobile computing industries. Tim’s prior experience includes senior marketing and sales leadership positions with Honeywell Scanning and Mobility, Intermec Corporation, Vocollect Inc., and Zebra Technologies, Inc. Tim holds an MBA from the University of Oregon Lundquist School of Business and a B.S. in Political Science from Oregon State University. He is co-inventor and patent holder for the number one selling ‘bi-optic’ scanner scale used at point-of-sale check-outs around the world. (Sponsored)  

Speakers

Here are the tracks for Big Data Day LA 2018:

— Data – Big Data, Databases/RDBMS, Hadoop/Spark/Kafka, NoSQL/NewSQL and other technologies that support databases, and data storage
— AI / ML / Data Science – Artificial Intelligence, Machine Learning, Data Science, Data Analytics, Data Analysis methods and applications
— Emerging Tech – Blockchain, IoT, Serverless, AR/VR and other new and emerging technology
— Infrastructure & Security – DevSecOps, Security, Data Governance and Privacy
— Visualizations / UI / Use Cases – Data Visualization, User Experience / User Interface developments with respect to data, and real world applications

Help us match the room sizes for our talks with attendee demand by giving us your feedback! Complete the quick survey here https://goo.gl/forms/TGwd1PTlb8LKF5UZ2 to rank the top talks (by track) you are most interested in.

Aaron Williams

VP of Global Community at MapD

Alyssa Columbus

Datanaut at NASA

Amanda K Moran

Technical Evangelist at DataStax

Amandeep Khurana

CEO & Co-Founder at Okera

Andrew Barkett

VP Engineering at REX - Real Estate Exchange

Brian Gold

Founding Member, FlashBlade at PureStorage

Charmee Patel

Product Innovation - Data & Analytics at Syntasa

Chia-Yui Lee

Data Science at Tibco

Chris Calvert

Co-Founder & VP Product Strategy at Respond Software

Chris Stephens

Data Engineering Manager at Netflix

Ebraheem Fontaine

Director Data Science at Edmunds

Gary Nakanelua

Managing Director at Blueprint Technologies

Grant Kushida

Head of Engineering at Conversion Logic

Harry Brisson

Director, Media Lab at Nielsen

Indrasis Mondal

Director, Big Data Engineering & Data Products at Hulu

Jordan Morrow

Global Head of Data Literacy at Qlik

Jörg Schad

Technical Lead Community Projects at Mesosphere

Jules Damji

Spark Developer & Community Advocate at Databricks

Justine Cocchi

Technical Evangelist at Microsoft

Karthik Ramasamy

Co-Founder & Chief Product Officer at Streamlio

Kevin Nelson

Architect Advocate at Google Cloud

Luis Bitencourt-Emilio

Principal Director of Engineering at Reddit

Mahesh Bellie

Head of Marketing at Agilisium

Mariana Danilovic

Managing Director at Hollywood Portfolio

Marie Smith

CIO at Data 360

Mark Quinsland

Field Engineer at Neo4j

Michael Malgeri

Principal Technologist at MarkLogic

Miguel Angel Campo-Rembado

SVP Data Science & Analytics at Fox

Nader Fathi

CEO at Kiana Analytics

Pat Alwell

Solutions Engineer at Hortonworks

Russell Ladson

CEO at Drop Software Inc.

Ryan Measel

Co-Founder & CTO at Fantasmo.io

Sathwik Shirsat

Big Data Engineer at Malwarebytes

Seth Stodder

Partner at Holland & Knight LLP

Shane Johnson

Senior Director of Product Marketing at MariaDB

Shant Hovsepian

Co-Founder & CTO at Arcadia Data

Shilpa Balan

Assistant Professor at California State University-Los Angeles

Sooraj Akkammadam

ETL Architect at Core Digital Media

Sujay Kulkarni

Manager, Data Engineering - Data & AI at Malwarebytes

Sushree Mishra

Senior Sales Engineer at Syncsort

Tomer Shiran

Co-Founder & CEO at Dremio

/ Data

Applying Probabilistic Algorithms

by Grant Kushida, Head of Engineering, Conversion Logic

We have seen dramatic improvements to job runtimes and associated costs by applying probabilistic algorithms when appropriate. With big-data jobs running at scale, computing exact answers is often overkill - instead, we can often answer the question ""accurately enough"" by approximating a reasonably-correct answer. For our use case (marketing analytics) we have seen benefit from: - Approximate set membership (Bloom filter) - Approximate cardinality (Hyper Log-Log) This talk will focus on use-cases, considerations and impact; not on the details of the algorithms or implementation.

Efficient Data formats for Analytics with Parquet and Arrow

by Tomer Shiran, CEO, Dremio

Hadoop makes it relatively easy to store petabytes of data. However, storing data is not enough; columnar layouts for storage and in-memory execution allow the analysis of large amounts of data very quickly and efficiently. It provides the ability for multiple applications to share a common data representation and perform operations at full CPU throughput using SIMD and Vectorization. For interoperability, row based encodings (CSV, Thrift, Avro) combined with general purpose compression algorithms (GZip, LZO, Snappy) are common but inefficient. As discussed extensively in the database literature, a columnar layout with statistics and sorting provides vertical and horizontal partitioning, thus keeping IO to a minimum. Additionally a number of key big data technologies have or will soon have in-memory columnar capabilities. This includes Kudu, Ibis and Drill. Sharing a common in-memory columnar representation allows interoperability without the usual cost of serialization. Understanding modern CPU architecture is critical to maximizing processing throughput. We’ll discuss the advantages of columnar layouts in Parquet and Arrow for in-memory processing and data encodings used for storage (dictionary, bit-packing, prefix coding). We’ll dissect and explain the design choices that enable us to achieve all three goals of interoperability, space and query efficiency. In addition, we’ll provide an overview of what’s coming in Parquet and Arrow in the next year.

Enabling real-time exploration and analytics at scale to drive operational intelligence at Hulu

by Indrasis Mondal, Director, Data Engineering and Data Products, Hulu

Data is one of most powerful assets for companies today and a key driver for innovation, product development and business efficiency. Operational intelligence allows modern organization to use that data asset in real-time to enable immediate insights to their business operations and allow rapid decision making for strategic advantage. In this presentation we will walk through the operational intelligence capabilities Hulu has built to process tens of millions of events per minute to enable fast exploration of data and real-time decision making .

Towards Data Science Engineering Principles

by Joerg Schad, Technical Lead Community Projects, Mesosphere

Over the last half century we have developed and refined the discipline of software engineering in order to accelerate the development and deployment of applications. This has involved a general shift towards DevOps practices that align developer and business objectives and dramatically reduce time-to-delivery. With the recent rise of data science and data analytics, the time has come to apply the principles of DevOps to data science and leverage the lessons from software engineering (and its systematic and repeatable methodology) to the discipline of data science. This rapidly emerging field is sometimes referred to as DataOps, and encompasses development of AI models and the overall platform surrounding them. In order to explore this concept, let's compare and contrast data science and software engineering principles. We can uncover similarities and differences between the two across the application development lifecycle.

Graph Computing: How the Gremlin Stole Christmas

by Justine Cocchi, Software Engineer, Microsoft

Graph databases are popular for modeling data where the relationship between data points is at least as important as the data points themselves. Come learn how and when you should leverage graphs to better model scenarios like social networks, project workflows, or in our case, Santa's present delivery route. We'll learn about the Gremlin console and how you can use Gremlin, a graph query language (think SQL for relational databases), with Azure Cosmos DB's Graph API to traverse your graph. Additionally, because Cosmos DB is NoSQL and multi-model we will also touch on how you can access your graph in other ways!

The Netflix Data Warehouse: 10 Years In and Staying Nimble

by Chris Stephens, Manager, Content Data Engineering, Netflix

In this talk, we'd like to recap how Netflix has worked through "first wave" data warehouse challenges over the last 10 years (tech stack, scalability, data ingest, compute vs storage), then describe our current focus on "second wave" challenges that come with environment maturity (our new data discovery tooling, our data quality and data auditing tooling maturity, "data doctor" alerts). For each point, we'll talk about how our solution allows us to address "second wave" issues while staying nimble. We hope we can share our ideas with other companies running mature analytics environments, and give smaller / newer companies an idea of what comes next.

Sponsored - How Malwarebytes Leverages Big Data, AI and the Cloud to Protect Millions

by Sujay Kulkarni, Manager, Data Engineering - Data & AI, Malwarebytes & Sathwik Shirsat, Big Data Engineer, Malwarebytes

Malwarebytes protects millions of companies and individuals globally from the world’s most harmful internet threats and malware. They collect billions of records each day on a millisecond by millisecond basis, and apply advanced machine learning and AI to identify and profile harmful threats and their sources. Join Sujay Kulkarni, Manager - Data Engineering for Data & AI at Malwarebytes, to learn how his team has designed and built their massive data processing and AI framework in the cloud. He’ll explore their architectural evolution and the role of critical technology components, including Control-M, open source and other technologies, in collecting and analyzing large volumes of data for ‘zero day’ remediation.

Sponsored - Big Data as a Service: Running Elasticsearch on Pure

by Brian Gold, Founding Member, FlashBlade, PureStorage

As organizations look to scale their use of modern analytics, the traditional deployment model of these tools has become a drag on productivity. Existing big-data architectures typically run on fixed sets of server instances with tightly coupled storage. While originally designed for scalability, these rigid environments cause server sprawl and increase time-to-deployment.

/ AI/ ML/ Datascience

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning Pipelines

by Jules Damji, Spark Community Evangelist, Databricks

We all know what they say – the bigger the data, the better. But when the data gets really big, how do you use it? This talk will cover three of the most popular deep learning frameworks: TensorFlow, Keras, and Deep Learning Pipelines, and when, where, and how to use them. We’ll also discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data), as well as help you answer questions such as: – As a developer how do I pick the right deep learning framework for me? – Do I want to develop my own model or should I employ an existing one – How do I strike a trade-off between productivity and control through low-level APIs? In this session, we will show you how easy it is to build an image classifier with Tensorflow, Keras, and Deep Learning Pipelines in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you, and perhaps with a better sense for how to fool an image classifier!

Using ML to analyze text message conversations

by Ebraheem Fontaine, Director - Data Science, Edmunds

Edmunds CarCode is a mobile texting platform used to connect customers with over 6,000 dealerships nationwide. We will discuss various machine learning driven enhancements to the product that helped to generate over 100K monthly conversations.

Collaborative Metric Learning Recommendation System: Application to Theatrical Movie Releases

by Miguel Campo, SVP Data Science & Analytics, Fox

Product recommendation systems are important for major movie studios during the movie greenlight process and as part of machine learning personalization pipelines. Collaborative Filtering (CF) models have proved to be effective at powering recommender systems for online streaming services with explicit customer feedback data. CF models do not perform well in scenarios in which feedback data is not available, in cold start situations like new product launches, and situations with markedly different customer tiers (e.g., high frequency customers vs. casual customers). Generative natural language models that create useful theme-based representations of an underlying corpus of documents can be used to represent new product descriptions, like new movie plots. When combined with CF, they have shown to increase the performance in cold start situations. Outside of those cases though in which explicit customer feedback is available, recommender engines must rely on binary purchase data, which materially degrades performance. Fortunately, purchase data can be combined with product descriptions to generate meaningful representations of products and customer trajectories in a convenient product space in which proximity represents similarity (in the case of product-to-product comparisons) and affinity (in the case of customer-to-product comparisons). Learning to measure the distance between points in this space can be accomplished with a deep neural network that trains on customer histories and on dense vectorizations of product descriptions. We developed a system based on Collaborative (Deep) Metric Learning (CML) to predict the purchase probabilities of new theatrical releases. We trained and evaluated the model using a large dataset of customer histories spanning multiple years, and tested the model for a set of movies that were released outside of the training window. Initial experiments show gains relative to models that don't train on collaborative preferences.

A Crash Course on Google Cloud AutoML and Machine Learning APIs

by Kevin Nelson, Google Cloud Architect Advocate, Google

Thanks to machine learning and AI, applications are now being created that can see, hear, and understand the world around them. Learn how you can easily infuse AI into your business today. In addition to a guided walkthrough of easy-to-use machine learning APIs from Google Cloud: Cloud Vision, Cloud Video Intelligence, Cloud Speech, Cloud Natural Language, and Cloud Translation, we'll demonstrate how Google Cloud AutoML enables developers with limited machine learning expertise to train high quality models by leveraging Google's state of the art transfer learning, and Neural Architecture Search technology.

The Need for Speed: How the Auto Industry Accelerates Machine Learning with Visual Analytics

by Aaron Williams, VP of Global Community, MapD Technologies

While GPU-accelerated analytics have already radically accelerated the speed of training machine learning models, data scientists and analysts still grapple with deriving insights from these complex models to better inform decision-making. The key: Visualizing and interrogating black box models with a GPU-enabled architecture. Volkswagen and MapD will discuss how interactive, visual analytics are helping the automotive brand interactively explore the output of their ML models to interrogate them in real time, for greater accuracy and reduced biases. They'll also examine how applying the GPU Data Frame to their efforts has enabled them to accelerate data science by minimizing data transfers and made it possible for their complex, multi-platform machine learning workflows to run entirely on GPUs.

The History (and Future) of ML at Reddit

by Luis Bitencourt-Emilio, Principal Director of Engineering - ML, Reddit

A run through of ML at Reddit starting in 2006 and through modern day, through building the team, the data platform, the experiments being run and their results.

Sponsored - BI Conversational Bot - The High Speed Bridge Between Data and Decision

by Mahesh Bellie, Head of Marketing, Agilisium

Customer facing chatbots have been around for a while. From hospitality, home entertainment, HR functions, call centres, to booking tickets for travel - chatbots have been providing excellent customer experience. Application of chatbots in these cases, however robotic, has been successful for a variety of reasons - narrow functionality, limited data sets for processing, pre-structured flows and very rarely a need to present complex analytics to end customers. Now, combined with advancements in Natural Language Processing (NLP) and Machine Learning algorithms, these chatbots display more human like behavior. But, the adoption of chatbots as an on-demand, ‘data-driven decision making’ Business Intelligence (BI) tool has been slow for various reasons. Unlike customer facing chatbots, BI conversational bots are required to address analytical questions that require complex & large amounts of data processing. And the users must be trained on the scope prior to using the bot as well. Also, given the data security concerns, these chatbots cannot be exposed to larger audience for training, hence are less conversational, making the user experience even poorer. At Agilisium, we have been experimenting with chatbot technologies to address this gap for our business consumers – building a truly conversational BI bot. We invite you to meet us at @BigDataDayLA to know more about it and to get a first-hand experience of the demo.

Sponsored - Using Data Literacy to Build an Insights Driven Culture

by Joe Franklin, Qlik

Data is the foundation for the new analytics economy. But data alone is useless unless it can be transformed into insights that lead to the digital transformation of your business. These insights rely on technology of course. But just as importantly, they rely on your workforce’s ability to read, work with, analyze and argue with data. Currently, a skills-gap exists within the world of data literacy, which can impede organization’s ability to be successful with data.

/ Emerging Tech

The Future of Workplace Insights With Augmented Reality (AR) and Big Data

by Gary Nakanelua, Managing Director of Experimentation & Invention, Blueprint Technologies

The world isn't flat so why are your dashboards? In this talk, we demonstrate how augmented reality (AR) can be leveraged with a large variety of data sources to give a deeper level of insight into daily workplace operations. Key takeaways include a demonstration of an AR view of employee productivity and enterprise operations using business system telemetry data and enterprise communication data. We will also give a practical look at organization modernization with AR and big data.

VR/AR Interface Design

by Russell Ladson, CEO, Drop Software Inc.

With the onset of the post-smartphone world, as technologists, we need to begin examining the design principles needed for spatial and immersive computing.

Mapping the World in 3D

by Ryan Measel, CTO, Fantasmo

The physical and digital worlds are converging. The proliferation of dense, semantically-labeled 3D maps is a fundamental need for current and emerging industries including augmented reality, autonomous robotics, retail, accessibility, and emergency response among others. In this talk, we will discuss the confluence of factors driving the next generation of maps and the challenges ahead.

A Researcher's Guide to the Third Dimension: Best Practices for Immersive Data Visualization

by Harry Brisson, Director, Nielsen Media Lab, Nielsen

One of the most exciting applications of VR is it's ability to create engaging and compelling exploratory data visualizations. Join us as we highlight some of our favorite examples, including some of Nielsen's own work in the space visualizing our rich viewership data. Along the way, we'll highlight 5 best practices you can apply as you venture into the third dimension. Plus: a quick tutorial on how you can create your own immersive visualizations in WebVR using A-Frame and D3.

Blockchain in Residential Real Estate

by Andy Barkett, Vice President, Engineering & AI, REX Real Estate

The promise of the blockchain in real estate has not yet been realized in any significant applications. REX is pursuing two types of real estate Dapps simultaneously: security tokens for houses to facilitate transfer and privacy, and utility tokens for automating record-keeping and removing the need for third parties in processes such as recording inspection results or title searches.

A Serverless Approach to Data Processing using Apache Pulsar

by Karthik Ramasamy, Chief Product Officer, Streamlio

Serverless is touted as next trend in computing. It refers to applications where server-side logic is written by application developer but unlike traditional architectures, its runtime consists of compute containers that are event triggered, ephemeral and fully managed by cloud providers in the case of public cloud or by dev-ops team in private cloud. While serverless is more viewed as a tool for microservices, the analytical world is still beset with tools with complicated APIs. In this talk, we will explore how the serverless paradigm is applied to data processing in Apache Pulsar, a next-generation data messaging platform. Apache Pulsar provides native support for serverless functions where the data is processed as soon as it arrives in a streaming fashion and that provides flexible deployment options (thread, process, container). We will describe how these functions make data engineering easier especially for common tasks in data transformation, data extraction, content routing and content filtering. Furthermore, we will describe how they can be applied to edge computing use cases where lots of noisy data is filtered and only relevant data is processed and transported to the data center.

Sponsored - Streaming and IoT

by Pat Alwell, Solutions Engineer, Hortonworks

Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.1.

Sponsored - When Rotten Tomatoes isn’t enough: Analyzing Twitter Movie Reviews using DataStax Enterprise

by Amanda Moran, Technical Evangelist, DataStax

Getting real-time insights is essential in this fast-paced world - like finding a good movie to catch this weekend. In this talk, we’ll use sentiment analysis on Twitter data about the latest movie titles to answer that age old question: “Is that movie any good?” We’ll show how we built the solution using Apache Cassandra, Apache Spark and DataStax Enterprise Analytics. This is a great talk to attend if you are new to the big data space, want to learn more about Cassandra and Spark, or just want to see a demo of DataStax latest product.

/ Infrastructure & Security

Drawing the Proper Balances -- Security, Convenience, and Privacy

by Seth Stodder, Partner, Holland & Knight LLP

Discussing the proper balances between the public's need for security and law enforcement, our desire for the fruits of big data analytics, and our privacy rights, drawing from my experiences as Assistant Secretary of Homeland Security in the Obama Administration and a professor teaching national security law at USC Law School.

Physical Security via Digitization and IOT

by Nader Fathi, CEO, Kiana

Protecting people, locations and assets is a critical part of any enterprise. Robust security surveillance systems are often inadequate or cost prohibitive, especially when it comes to locations with large square footage, very active foot traffic, or a high volume of transient visitors such as corporate campuses, event venues, transportation hubs and shopping malls. Luckily, we live in a world where everything and everyone is connected. Every person today emits and leaves behind a digital footprint that when combined with location, time and camera data, can be utilized to enhance premise security and safety. Today's IoT solutions offer peace of mind to facility managers, security guards, and corporations across the globe.

Agile Integration Using an Enterprise Data Hub

by Michael Malgeri, Principal Technologist, MarkLogic, Corp.

Today, business processes and decisions are driven from information contained in a variety of internal systems of record (SoRs) and other external sources (social media, partners, suppliers, customers). Enterprises are discovering that out of the box (OOTB) solutions often don't exist for their operational and observational requirements. In addition, traditional integration solutions often take too long to implement and are brittle. This presentation will discuss how an enterprise operational data hub (ODH) can be an agile, flexible solution for integrating key information sources in a timely manner for driving mission critical applications and reporting.

Critical data management practices for easy and unified data access that is secured and meets regulatory compliance

by Amandeep Khurana, CEO, Okera

The desire to stay competitive, deliver better customer experiences and gain greater business agility is driving companies to move their data platforms to the cloud. Data analysts and scientists want modern, advanced analytics that are self-service. Companies who are well down the path to the cloud now have multiple projects involving various parts of the company but often without any notion of access, governance or security common across these projects. This quickly opens the door to risk and confusion. This session will focus on critical data management practices that companies should adopt to achieve the desired benefits and avoid the pitfalls that could lead to complex expensive architectures and unattainable enterprise or regulatory governance requirements.

What does Probability Theory have to do with Cyber Security? It turns out, a lot.

by Chris Calvert, Co-Founder & VP Product Strategy, Respond Software

Cyber security attacks are increasing in frequency, each one becoming more potentially harmful than the last. Traditional methods of threat prediction and mitigating risks cannot keep pace with the increased sophistication of attacks, requiring a fundamental shift in security technology. Respond Analyst leverages Probability Theory and advanced algorithms to analyze all relevant cyber observables, ultimately making fully contextualized and informed decisions at the scale, speed and consistency no human can match. Key Takeaways: - Today's sophisticated security attacks require a new way of thinking about security within the enterprise - Probability theory, applied in the context of enterprise security, is a major paradigm shift and requires a change in how we think about, and address, risk - Uncertainty, dimensional probabilities shades of grey - Proactive rather than reactive - Reasoning over rules-based logic - Respond Analyst software analyzes all relevant cyber observables to predict and learn; ultimately making fully contextualized and informed decisions, based on the most likely explanation, with the scale, speed and consistency no human can match.

AI Powered Customer Analytics in the age of GDPR

by Charmee Patel, Product Innovation Research, Syntasa

Online and offline customer interactions generate prodigious amount of data that can be analyzed by AI and ML algorithms in order to improve customer experience. The recently enacted European General Data Protection Rules (GDPR) grants customers control over their data including the right to be forgotten. The big data technologies that are capable of analyzing massive customer interaction data (e.g., HDFS, Kafka, Spark, Flink, and Beam) are most performant when working with immutable data, which does not bode well with the right to be forgotten. In this presentation, we briefly discuss the implications of GDPR on various behavioral data generated by websites, apps, in-store, and ad impression data. We show how GDPR requirements can be efficiently satisfied using an architecture that isolates a GDPR-safe analytics zone, which is the only area accessible to the users, from the input and output zones where encryption, decryption, and key-management is tightly controlled to satisfy the GDPR requirements.

Sponsored - Populating your Enterprise Data Hub for Next Gen Analytics

by Sushree Mishra, Senior Sales Engineer, Syncsort

Syncsort data integration solution and data quality solution on hadoop can help accelerate the process of Populating your Enterprise Data Hub with data from multiple disparate data sources like legacy systems, databases, ERPs ,CRMs ,etc. Standardizing and cleansing the data before it is ingested into the data lake will dramatically increase the analytics value proposition.

Sponsored - From the Panama Papers to Russian Trolls - How Graph Databases are Revealing Hidden Relationships and Exposing Corruption.

by Mark Quinsland, Field Engineer, Neo4j

With the assistance of Neo4j's graph experts, the International Consortium for Investigative Journalism won the Pulitzer Prize for exposing the complex relationships hidden away in the Panama Papers of the wealthy, their money, government officials, and tax havens. Neo also provided them with expertise for analyzing the subsequent Paradise Papers. More recently, Neo helped MSNBC use graph techniques to successfully identify the Russian Trolls responsible for over 200k fake tweets during the 2016 election This presentation will show the techniques used in these projects and how they are being used successfully by many global organizations for their cybersecurity, identity resolution, and fraud use cases.

/ Visualizations/ UI/ Usecases

Best Practices in Data Visualization

by Shilpa Balan, Assistant Professor, California State University-Los Angeles

Visualizations are an effective way to communicate information. Visuals make it easy to spot patterns. Visualizing data leads to better data enlightenment. A good visualization should tell the story in seconds. Data visualizations are a medium to present interesting analysis and findings. Good visuals consider both user needs and business needs. Many data analysts are not experts in communicating or presenting the insights effectively. This could result in several insights being lost in the presentation. A data analyst should be able to present the insights to the end users. Data can be worthy only if the end users can understand it. Thus, the goal of data visualization is to use images to improve the user's understanding of the data. The aim of this presentation is to discuss the best practices of data visualization and how data analysts can use these techniques to communicate their results and findings effectively to the end users.

Standing on shoulder of giants building Core Digital Media's data platform using Big Data technologies

by Sooraj Akkammadam, ETL Architect, Core Digital Media

Core Digital Media is one of the leading advertisers in the online space. We are responsible for about a fifth of the ads a person sees online on a day to day basis. This is accomplished by our Marketing team, who continues to find performance wins by leveraging propriety algorithms, deep learning models and advanced analytics to optimize the Ad spend. All these frameworks rely on the performance data available in the Core Digital Media's Enterprise Data Warehouse (EDW) to make decisions. The performance data is collected from a multitude of marketing channels like Social (Facebook), Search(Google), Content(Taboola), Media, Affiliate and Retention and integrated into EDW. Timely and consistent availability of data in EDW is extremely critical for marketing optimization. This talk details how we migrated our Marketing data loads from a legacy ETL platform to a data infrastructure built around Apache Kafka and Apache Spark, using Python and Scala. This migration not only lead to a reduction in data availability times from an average of >60 minutes to 2 mins but also we are able to load 24*7 into the EDW. As a part of the talk we will talk about why we chose Kafka and Spark Structured streaming for this, what were some of the challenges we faced and also some of the best practices to follow for implementing data streaming architectures.

Creating operational efficiency: Automating big data, AI, and ML to improve profit

by Marie Smith, Chief Information Officer, Data 360

Manual tasks take time. They must be performed linearly by humans who are prone to errors and who are unable to consistently perform to the highest standards. Automation reduces the number of tasks you and your employees would otherwise need to do manually. This frees up your time to work on items that add genuine value to the business, allowing you to be more innovative and increasing your employees levels of motivation. Automation also allows you to get more done in the same amount of time, greatly increasing productivity. Learn how to use automation with artificial intelligence (AI) in this enlightening workshop and profit.

How is Blockchain Changing Big Data Relationships in Entertainment

by Mariana Danilovic, CEO, Hollywood Portfolio

A presentation about big data issues in Entertainment that are driving adoption of blockchain platforms.

Telling Meaningful Stories with Data

by Alyssa Columbus, Datanaut, NASA

According to Edward Tufte, an excellent data visualization expresses “complex ideas communicated with clarity, precision and efficiency. Visualization is a dynamic form of persuasion, telling a story through the graphical depiction of statistical information. Few forms of communication are as persuasive as a compelling narrative. So how does a data scientist tell a meaningful story with a visualization? The analysis has to find the story that the data supports, and journalists have become very good at storytelling with visualization via infographics. In that vein, this presentation will share how some journalistic strategies on telling a good story can be applied to data visualization.

How a data science project effected change for low-income parking citation holders

by Chia-Yui Lee, Data Science, TIBCO Software

In March 2018, the San Francisco Municipal Transportation Agency (SFMTA) announced reforms that would alleviate the burden of parking ticket fines on low-income San Franciscans. There is increasing awareness that parking citations can result in disproportionate hardship for drivers, from late fees, towing, and even loss of income resulting from the impounding or sale of a vehicle. Attend this presentation to learn how Tipping Point, a non-profit that fights poverty in the San Francisco Bay Area, with the help of data scientists from TIBCO, used a collaborative data science platform and point-and-click machine learning tools to uncover insights from parking citation data. We will also show you the importance of a versatile visual analytics platform in communicating the findings to decision makers.

Sponsored - A tale of two BI standards: Data warehouses and data lakes

by Shant Hovsepian, Co-Founder and CTO, Arcadia Data

Data lakes as part of the logical data warehouse (LDW) have entered the trough of disillusionment. Some failures are due to lack of value from businesses focusing on the big data challenges and not the big analytics opportunity. After all, data is just data until you analyze it. While the data management aspect has been fairly well understood over the years, the success of business intelligence (BI) and analytics on data lakes lags behind. In fact, data lakes often fail because they are only accessible by highly skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives? Shant Hovsepian explains why existing BI tools are architected well for data warehouses but not data lakes, the pros and cons of each architecture, and why every organization should have two BI standards: one for data warehouses and one for data lakes.

Sponsored - Why use a columnar database for analytical workloads?

by Shane Johnson, Senior Director - Product Marketing, MariaDB

In this session, we’re going to discuss how columnar databases the improve performance and efficiency of analytical workloads. We’ll begin by explaining why transactional queries (e.g., return every column in a single row) benefit from row-based storage whereas analytical queries (e.g., return the aggregate of a single column in every row) benefit from column-based storage.

We will walk through the storage and query processing architecture of MariaDB AX, an open source columnar database, to show how columnar databases work. In addition, we will show how massively parallel processing, combined with column-based storage, not only improves the performance and efficiency of analytical workloads, but scales to support interactive, ad hoc analytical queries on terabytes of data and billions of rows in real time.

Help us match the room sizes for our talks with attendee demand by giving us your feedback! Complete the quick survey below:

Volunteers / 2018 Volunteers

Due to the large volume of volunteers already signed up, we are not accepting new volunteers at this time. Thank you for your interest and please check back with us next year.

Big Data Day LA is a fully volunteer-supported and organized event, and we have a proud history of volunteers who have joined us in helping organize the event in past years and stayed on as alumni volunteers, friends, cheerleaders, and mentors.

As a volunteer, you will have the opportunity to help organize an event with 30+ sessions, speakers and sponsors from some of the biggest and best data companies in the country, and over 1500+ attendees expected this year.

You can participate wherever your interests fit best – we are accepting volunteers in the following teams:

– Marketing
– Technology
– Track / Sessions
– Location
– Food / Beverage
– Registration

Tutorials / 2018 Tutorials

This year attendees will be able to register for tutorials – Blockchain & Datascience on site. The space is limited and open to registered attendees. You must bring your laptops. Registration will be confirmed once you sign up and your email is validated. You can only sign up for one or the other since both tutorials go on at the same time between 2-6 PM. Further instructions will be sent once you are confirmed!!

Blockchain Tutorial provided by Abraham Elmahrek

Blockchain Tutorial Signup!!

Datascience Tutorial provided by theDevMasters, a GBCS company

Datascience Tutorial Signup!!

About the Conference / What You Need To Know

Big Data Day LA is the largest, of its kind, Big Data conference in Southern California. Spearheaded by Subash D’Souza and organized and supported by a community of volunteers, sponsors and speakers, Big Data Day LA features the most vibrant gathering of data and technology enthusiasts in Los Angeles.

Attendance is free if you register before Apr 30th. Following the early registration deadline, we are introducing a tiered pricing model. We believe people who are willing to pay a small fee are more likely to attend the event, and having an accurate number of attendees helps us prepare and manage the event.

March & April: Registration is FREE starting March 1 through April 30th.

May & June: $40 to register

July 1st until the day of the event: $50 to register

Students get 50% off using code STDNT (bring your ID card on the day of event to check-in)

Big Data Day LA will be re-branding under a new name! We plan on serving data and technology enthusiasts for many years to come and believe our event should reflect the varying interests of the entire tech community. Stay tuned for the name reveal at this year’s event.

The first Big Data Day LA conference was in 2013, with just over 250 attendees. We have since grown to over 550 attendees in 2014, 950+ attendees in 2015, 1200+ attendees in 2016, and 1550+ attendees in 2017.

Our 2018 session tracks will include:

  • Data
  • AI/ ML/ Data Science
  • Emerging Tech
  • Visualizations/ UI/ Use Cases
  • Infrastructure & Security

Subscribe to our mailing list

* indicates required

Attendees / See Who Will Be There

/ Data Scientists

/ Software Developers

/ System Architects

/ Head Researchers

/ Business Analysts

/ Data Engineers

/ Technical Leads

/ CEOs, CTOs, CIO, etc.

/ IT Managers

/ Business Strategists

/ Data Analysts

/ Researchers

/ Head Data Scientists

/ Entrepreneurs

/ Consultants

Organizers / 2018 Organizers

Subash D’Souza

Organizer, Sponsors & Sessions Chair

Startup Showcase / 2018 Startup Showcase

This year’s Big Data Day LA Startup Showcase is focusing on Media and Entertainment to pay homage to the quintessential Hollywood! We are excited to share the innovation our data community brings to the rich tradition of media and entertainment in Los Angeles.

General Round:
We are opening this round to applications from all startups in the media and entertainment space. Complete the Startup Pitch Competition application form by July 15th and include a link to your pitch deck or executive summary. If you are selected to participate in the final pitch round, you will be notified by July 25th.

Final Pitch Round:
Five finalists will have the opportunity to present a 5-min pitch onstage to the panel of judges below from Warner Bros., Disney, NBCUniversal, Netflix, and TenOneTen Ventures on conference day – Aug 11th.

Prizes:
First Place – $1000, $500 credit for General Assembly classes/workshops* and 1 one-on-one feedback and strategy session with a VC from TenOneTen Ventures
Second Place – $500 and $250 credit for General Assembly classes/workshops*
Third Place – $250 and $100 credit for General Assembly classes/workshops*

*The credits would be good for any of General Assembly’s short-form programming . The credits would not be applicable for General Assembly’s long-form programming (PT courses and Immersive programs).

We will be adding more prizes as we get closer to conference day.

Key Dates:
July 1st – Startup Pitch Application deadline
July 25th – Finalists notified
Aug 11th – Final Pitch Round with 5 finalists at Big Data Day LA

July 15th is the deadline for applications for the pitch competition. Use the form below to submit your pitch today!

Judges

Arvel Chappell III

Manager - Emerging Technology at Warner Bros.
Recently joining WB, Arvel’s career has spanned Space, AI, Cinema and Virtual Reality. He began his career working in the Space industry building technology for government customers, eventually serving as chief engineer over the Wideband Global Satcom Constellation. Currently, Arvel serves on the IEEE ethics in Artificial Intelligence committee where they are creating industry guidelines to facilitate AI’s use. In addition to engineering, Arvel has enjoyed a love of cinema and holds an MFA in cinema production from USC, was a Sundance screening writing lab semi-finalist, has directed numerous short films and most recently a VR experience for Daily Show’s Trevor Noah. At Warner Bros Arvel is part of the Emerging Technology group where he is leading VR & Artificial Intelligence R&D projects which are inventing the future of immersive content.

Austin Clements

Venture Capitalist at TenOneTen Ventures
Austin Clements is a venture capitalist with TenOneTen Ventures, a seed stage fund base in LA. Prior to TenOneTen, Austin worked at Sony Pictures, licensing film and TV content to emerging video platforms. He began his career in investment management working with AllianceBernstein and has since held roles evaluating early stage tech investments at Creative Artists Agency, Digital Entertainment Ventures, and NY Angels. Austin is also committed to encouraging underrepresented minorities to pursue careers in tech and entrepreneurship. He is an active volunteer in various local youth entrepreneurship programs. Austin received his MBA from NYU Stern with a specialization in Media, Entertainment, and Technology and his BA from Morehouse College in Atlanta, GA.

Jason Flittner

Content and Studio Data Engineering Leader at Netflix
Jason Flittner leads the Content and Studio Data Engineering teams at Netflix. He was previously an analytics manager and engineer, focusing on data transformation, analysis, and visualization of Netflix content data. Prior to Netflix, Jason led the EC2 business intelligence team at Amazon Web Services and was a business intelligence engineer with Cisco Systems.

Jeff Rosenberg

Director of Software Development for Reporting, Analytics and Data Science at Hulu 
Jeff Rosenberg is the Director of Software Development for Reporting, Analytics and Data Science at Hulu, where his team is responsible for the overall technology direction of business intelligence and governance, big data platform and infrastructure, data products, data quality management and data science.  Jeff’s background is in biological sciences, technical program management and software development, with a strong focus on media.  Before leading the data organization at Hulu, he successfully led the cross-functional effort to ship Hulu’s landmark ‘No Commercials’ plan.  Previously, he worked on platform and product development with companies including Warner Bros, Fox/Myspace, DirecTV, and Sony/Playstation, driving execution of countless products backed by billions of data points. Jeff holds a BS in Biology from Brown University.

Levon Karayan

Sr. Director of Engineering at The Walt Disney Studios
Levon Karayan has over 25 years of experience from being employee number one to one numbered in the hundreds of thousands.  Some of his leadership roles have been in the game industry, internet finance, sponsored search advertising, startup incubation, social advertising, mobile advertising and media and entertainment.  He’s currently one the leaders focusing on data technology at The Walt Disney Studios where he was one of the first to use the public cloud and machine learning in production applications.   His passions lie where technology can innovate in business.

Paul Orlando

Incubator Director and Adjunct Professor of Entrepreneurship at USC
Paul Orlando accelerates company growth. He is Incubator Director and Adjunct Professor of Entrepreneurship at the University of Southern California (USC) in Los Angeles. Paul has founded and operated successful startup accelerator programs in Hong Kong (focused on mobile development), Los Angeles (focused on growth of wide-ranging companies with founders affiliated with USC), and the Laudato Si accelerator affiliated with the Vatican in Rome (focused on environmental technology). Companies Paul has worked with have raised tens of millions in capital, served millions of customers, and have been acquired. He has authored several related academic case studies available on Harvard Business Publishing. Paul also consults to larger institutions as they innovate, develop, and grow, especially in transitioning past legacy business models by using rapid experimentation techniques. In the past Paul also co-founded a B2B startup and consulted to Fortune 100 firms. Paul has been featured in media including Forbes, TechCrunch, Fast Company, the Wall Street Journal and was a winner at the TechCrunch Disrupt Hackathon. He has a BA from Cornell, an MBA from Columbia, and speaks Mandarin.

Trinh Nguyen

Sr. Director of Technology for E! Entertainment at NBCUniversal
Trinh Nguyen leads the cloud infrastructure and application development for the E! Entertainment network. E! is part of the NBCUniversal’s cable portfolio and Trinh has been leading the charge with cloud native and progressive application development. With over 10 years at Comcast & NBCUniversal combined he is now focused on helping various divisions within his media company transition to adopting big data and cloud technologies more effectively. Companywide he is also the committee lead for NBCU’s Asian Pacific American employee resource group primarily focusing on professional development for employees who are interested in their career growth. He has a B.S. in Computer Engineering from the University of California Irvine. When not promoting leadership principals in the workplace, you can find him at home raising his newborn son, Maverick, who as they say is very “dangerous”.