SCENES FROM BIG DATA DAY LA 2015 / Checkout Our Presentations!

Didn’t make it to Big Data Day LA 2015? No problem! Check out complete presentations from our presenters below!

Sponsors

2015 Sponsors

2015 - Agile iSS2015 - HP Haven onDemand
2015 - Altiscale2015 - Connexity2015 - Datastax2015 - Netflix2015 - Qubole2015 - VoltDB
2015 - Cask2015 - Cloudera2015 - Couchbase2015 - Datascience.com2015 - Factual2015 - GoodData2015 - MapR2015 - MongoDB2015 - Neo4J2015 - NewMet Data2015 - Pivotal2015 - Rubicon Project2015 - StackOverflow Careers2015 - Talena
2015 - Archangel Technology Consultants, LLC2015 - DataScience.LA2015 - Diamond Web Services2015 - Los Angeles Big Data Users Group2015 - ppm.io

Keynote Speakers / 2015 Keynotes

Abhi Nemani

Abhi Nemani is a writer, speaker, organizer, and technologist. He is currently serving as the first Chief Data Officer for the City of Los Angeles, where he leads the city's efforts to build an open and data-driven LA.

Alan Gates

Alan, one of the founders of Hortonworks, is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a book from O`Reilly Press.

Karen Lopez

Karen is a senior project manager with an extensive background in development processes and information management. She specializes in taking practical approaches to systems development. She has helped many IT departments choose appropriate methods and standards, based on the department’s culture, experience, and focus. Karen is an international speaker on the Zachman Framework, information privacy, IT certification and licensing, data modeling and process modeling (DAMA, Enterprise Data World, EDF, EIM, DAMA Australia, CIPS, ZIFA, IRMUK/DAMA Europe, SQLPASS, SQLSaturdays). She has authored several white papers on better collaboration with business and development resources.

Michael Stack

Michael is an engineer on Cloudera's HBase team. He was the first project chair for HBase and is currently a committer/PMC member for that project, as well as a member of the Hadoop PMC.

Reynold Xin

Co-Founder, Databricks 9:40 AM  FA-100
Reynold Xin is an Apache Spark PMC member and Chief Architect for Spark and co-founder at Databricks.

Speakers / 2015 Speakers

Aaron Wepler

Field Engineer At Pivotal

Abraham Elmahrek

Software Engineer At Cloudera

Adam Mollenkopf

Real-Time & Big Data GIS Capability Lead at ESRI

Adam Tourkow

Solutions Engineer At DataStax

Amelia McNamara

Statistics PhD Candidate, UCLA

Arvind Prabhakar

CTO At StreamSets, Inc.

Ashish Singh

Software Engineer At Cloudera

Benjamin Uminsky

Data Analytics At LA County Registrar-Recorder

Bikas Saha

Software Engineer At Hortonworks

Brian Kursar

VP, Data Strategy & Architecture At Warner Bros

Bryan Reinero

US Developer Advocate At MongoDB Inc.

Chris Fregly

Research Scientist at PipelineIO

David Chaiken

CTO At Altiscale

Edward Ma

Software Engineer At HP Vertica

Felix Chern

Senior Engineer At OpenX

Gian Gonzanga

Director Of Content Algorithms And Data Science At Netflix

Hyunsik Choi

Research Director At Gruter Inc.

Jeff Morris

Software Engineer At Couchbase

Jim McGuire

Head Of Modeling At ZestFinance

John De Goes

De Goes Consulting Inc.

Jonathan Gray

CEO At Cask

Josh Wills

Director Of Data Science At Cloudera

Josiah Carlson

VP Of Technology At OpenMail

Khanderao Kand

Senior Director Big Data Cloud At Oracle

Kyle Polich

Principal Data Scientist At DataScience, Inc

Lawrence Trinh

Systems Development Manager At Fandango

Martin Zerbib

Developer Evangelist At HP SW Big Data

Maxim Lukiyanov

Program Manager At Microsoft

Mike Limcaco

CTO at Agilisium Consulting

Minesh Patel

Solutions Architect At Qubole

Molly O’ Connor

Software Engineer At Factual

Peyman Mohajerian

Sr. Architect At Teradata

Rachel Pedreschi

Technical Evangelist At Datastax

Raj Babu

CEO At Agilisium

Romain Rigaux

Software Engineer At Cloudera

Ryan Betts

CTO At VoltDB

Sabri Sansoy

CEO & Roboticist at Sansoy Robotics

Saritha Ivaturi

Director Product Management At Velocify

Seshadri Mahalingam

Software Engineer At Trifacta

Szilard Pafka

Chief Scientist At Epoch

Tim Fulmer

VP Engineering At HopSkipDrive

Ulas Bardak

CDO Of Whisper

Vinayak Borkar

CTO At X15 Software

Will Gage

Software Architect At Connexity, Inc.

Will Ochandarena

Director Of Product Management At MapR Technologies

Zain Asgar

Sr. Software Engineer At Trifacta

Previous Attendees / Previous Big Data Day LA Attendees include:

AmazonDisneyIBMLive NationMicrosoftNetflix

Organizers / 2015 Organizers

Joe Devon

Entrepreneur, CXO & Advisor

Subash D’Souza

Big Data/Hadoop/Spark Evangelist

Szilard Pafka

Sessions (Data Science) Chair

ERROR: No timetable found with id: 2

Volunteers / 2015 Volunteers

Abraham Elmahrek

First Employee at FOSSA, Inc.

Adra Graves

Senior Data Analyst at DogVacay

Alex Hammer-Barulich

Data Visualization & Cartographic Design

Andy Seltzer

CTO / Sales Mngt. at Even Enterprises

Ani Teroganesyan

Technical Recruiter at Crescent Solutions

Byron Dover

Skunkworks at Wiredrive

Christina Hoang

Technical Recruiter at Crescent Solutions

Cristina Salajan

Project Manager at Farmers Insurance

Dami Osoba

Data Scientist/Quantitative Researcher

Dan Gutierrez

Data Scientist at AMULET Analytics

Daniel A. Ochoa

Help Desk/Systems Engineer at Factual Inc

Dave Nielsen

Cloud Computing Evangelist & Consultant

David Comfort

Data Scientist/ Computational Biologist

Dean Okamura

Engineer at IBM Corporation

Eduardo Arino de la Rubia

VP of Product at Domino Data Lab

Eric Lui

VP, Engineering at Second Spectrum

Eswara Prasad

Data Engineer at Cognizant

Franklin Horn

Product Manager & Anthropologist

Jae Lim

Analytic Programmer/ Investor/ Thinker

Jimmy Kim

Enterprise Account Executive at Spectrum Enterprise

John Kim

VP, Operations at Captive Eight

Julia Davidovich

Inbound Marketer at Diamond

Kaloyan Todorov

Senior Data Scientist at Fama

Kyle Walker

Lead Engineer at ZEFR

Linda Dougherty

User Experience Designer at TV4 Entertainment, Inc

Luke Lipan

Enterprise Sales Manager at DataStax

Mahmoud Kamalzare

Researcher At USC

Matti Siltanen

Dir Operations/IT | Sales Engineer

Michael Chiang

Executive Director at Crescent Solutions

Natasha Drayson

Vice President of Client Services at OrangePeople

Nazli Dereli

Data Scientist at Live Nation

Omar Atayee

Sales Engineer at Mashape

Oswald Jones

Lead Engineer at Griddy

Oszie Tarula

Programmer / Analyst III (Lead Web UI/UX Developer) at UCLA

Priyanka Biswas

PhD candidate at USC

Raj Babu

CEO at Agilisium

Rich Ung

Data Engineer at Disney ABC Television Group

Rita Kuo

Senior Systems Engineer at Raytheon

Siva Bhavanari

Senior Data Engineer

Sooraj Akkammaddam

ETL Architect at Core Digital Media

Tseng Chan Saechao

Academic Manager at Franklin Educational Services, Inc.

Tuck Ngun

Data Analysis

Vikki Appel

Technical Product Management

Weixiang Chen

Founder at NewMet Data

Yee Fung

Front-end Wordpress Developer at Diamond

Zhihui Xie

Postdoctoral Research Fellow at Children's Hospital Los Angeles

Download
Showing All
Big Data
Data Science
Entertainment
Hadoop/Spark/Kafka
IoT
Machine Learning
NoSQL
Use Case Driven
11:00 am
11:30 am
12:00 pm
12:30 pm
1:00 pm
1:30 pm
2:00 pm
2:30 pm
3:00 pm
3:30 pm
4:00 pm
4:30 pm
5:00 pm
5:30 pm
6:00 pm
6:30 pm
7:00 pm
Bovard
Bovard
Data as a Strategic Asset
11:00 am - 11:30 am
Lilian Coral, Chief Data Officer At Office Of LA Mayor Eric Garcetti
Data as a Strategic Asset
11:00 am - 11:30 am
Lilian Coral, Chief Data Officer At Office Of LA Mayor Eric Garcetti
Lilian Coral
Chief Data Officer At Office Of LA Mayor Eric Garcetti
IoT
Bovard
11:00 AM - 11:30 AM

Abstract:- The City of Los Angeles, with 4 million residents and nearly 50 million visitors annually moving across 469 square miles, is not only one of the most densely populated cities, it also hosts one of the largest, most complex city infrastructures in the world. 6,000 miles of sewer underlie 22,000 miles of paved streets, that connect over 4,500 intersections, 50,000 city connected street lights and 2,000,000 google/waze connected sensors. This network of people and infrastructure are connected through the data and the systems that support them. As data transforms from an unstructured asset into the organizational wisdom that can drive this Smart City, the City of Los Angeles and the Office of Mayor Eric Garcetti work to identify new technologies and strategies for managing and harnessing the growing amount of data available to inform decision-making.

Sponsored - Data engineering at the interface of art and analytics: the why, what, and how of Netflix's data engineering team Hollywood
11:40 am - 12:10 pm
Josh Hemann, Director - Content Data Engineering & Analytics At Netflix
Sponsored - Data engineering at the interface of art and analytics: the why, what, and how of Netflix's data engineering team Hollywood
11:40 am - 12:10 pm
Josh Hemann, Director - Content Data Engineering & Analytics At Netflix
Josh Hemann
Director - Content Data Engineering & Analytics At Netflix
Entertainment
Bovard
11:40 AM - 12:10 PM

Abstract:- Netflix has a growing presence in Hollywood, with technical teams working on everything from high-speed video editing pipelines to machine learning methods for categorizing films. Data is foundational across these efforts and in this talk Josh will take a tour through why we invest so much in data about content, what data engineering challenges we tackle, and the style in which we do it.

Opening the black box: Attempts to understand the results of machine learning models
12:20 pm - 12:50 pm
Michael Tiernay, R&D Data Scientist At Edmunds.Com
Opening the black box: Attempts to understand the results of machine learning models
12:20 pm - 12:50 pm
Michael Tiernay, R&D Data Scientist At Edmunds.Com
Michael Tiernay
R&D Data Scientist At Edmunds.Com
Data Science
Bovard
12:20 PM - 12:50 PM

Abstract:- Sophisticated machine learning models (like GBMs and Neural Networks) produce better predictions than simpler models (like linear or logistic regression), but sophisticated models do not produce interpretable 'effects' that specify the relationship between predictors and and outcome. This is because sophisticated models can learn non-linear, interactive, or even higher level relationships between the predictors and outcome without being explicitly specified. In many settings it is often important to understand, as best as possible, how 'black box' models are producing because:1. If users do not understand how a prediction is being made, they may not trust the model/prediction enough to act upon the model's suggestions. Significant business value can be derived from understanding what drives an outcome of interest (e.g. purchase or churn) in order to make product changes to accentuate or minimize desired effects 3. Understanding how predictors relate to an outcome can inform subsequent feature generation that can improve a model's predictive power. This talk will discuss two methods that have been proposed to better understand machine learning models: simulating changes in input variables (the R ICEbox package) or building a simpler model locally around specific predictions (the Python LIME package).

A Practical Use of Artificial Intelligence in the Fight Against Cancer
2:00 pm - 2:30 pm
Brian Dolan, Founder & Chief Scientist At Deep 6 AI
A Practical Use of Artificial Intelligence in the Fight Against Cancer
2:00 pm - 2:30 pm
Brian Dolan, Founder & Chief Scientist At Deep 6 AI
Brian Dolan
Founder & Chief Scientist At Deep 6 AI
AI/ Machine Learning
Bovard
2:00 PM - 2:30 PM

Abstract:- Artificial Intelligence is an important topic in the fight against cancer. Clinical Trails are at the frontier of innovation. I will discuss techniques, data sets and platforms we use at Deep 6 to bring patients to clinical trials. The focus will be on practical, repeatable methods I've developed at MySpace, Greenplum, UCLA and the US Intelligence Community.

Sponsored - Accelerating Real-Time Decision Systems using Redis
2:40 pm - 3:10 pm
Tague Griffith, Developer Advocate At RedisLabs
Sponsored - Accelerating Real-Time Decision Systems using Redis
2:40 pm - 3:10 pm
Tague Griffith, Developer Advocate At RedisLabs
Tague Griffith
Developer Advocate At RedisLabs
Hadoop/Spark/Kafka
Bovard
2:40 PM - 3:10 PM

Abstract:- One of the challenges faced when deploying a machine learning project into production, is how to build the real-time decision part of the system. Many open source projects exist to help construct machine learning pipelines, but you are often left to construct your own custom server to enable decision making. Redis can be used in place of custom code to build your serving system.

Big Data for Good
3:20 pm - 3:50 pm
Jill Dyche, Vice President, Best Practices At SAS
Big Data for Good
3:20 pm - 3:50 pm
Jill Dyche, Vice President, Best Practices At SAS
Jill Dyche
Vice President, Best Practices At SAS
Big Data
Bovard
3:20 PM - 3:50 PM

Abstract:- In this talk, SAS Vice President and non-profit founder Jill Dyche revisits the customer journey. After all in the age of omnichannel and digital everything, your customers are taking a different path than the one your Marketing department mapped out all those years ago. But big data's reach transcends the unstructured data of big corporations. Jill will explain how a personal mission led her to the realization that big data can be applied to many different journeys. She'll tell a story of how her work in the social sector with big data and analytics not only helps animal shelters refine outreach programs, but can save lives!

The Netflix data platform: Now and in the future
4:30 pm - 5:00 pm
Kurt Brown, Director, Data Platform At Netflix
The Netflix data platform: Now and in the future
4:30 pm - 5:00 pm
Kurt Brown, Director, Data Platform At Netflix
Kurt Brown
Director, Data Platform At Netflix
Entertainment
Bovard
4:30 PM - 5:00 PM

Abstract:- The Netflix data platform is constantly evolving, but at it's core, it's an all-cloud platform at a massive scale (60+ PB and over 700 billion new events per day), focused on enabling developers. In this talk, we'll dive into the current (data) technology landscape at Netflix, as well as what's in the works. We'll cover key technologies, such as Spark, Presto, Docker, and Jupyter, along with many broader data ecosystem facets (metadata, insights into jobs run, visualizing big data, etc.). Beyond just tech, we'll also dive a bit into our data platform philosophy. You'll leave with insights into how things work at Netflix, along with some ideas for re-envisioning your data platform.

Data is cheap; strategy still matters
5:10 pm - 5:40 pm
Jason Lee, Principal, Advanced Analytics Group At Bain & Company
Data is cheap; strategy still matters
5:10 pm - 5:40 pm
Jason Lee, Principal, Advanced Analytics Group At Bain & Company
Jason Lee
Principal, Advanced Analytics Group At Bain & Company
Use Case Driven
Bovard
5:10 PM - 5:40 PM

Abstract:- What could a Strategy Consulting firm have to do with or say about big data? We see Big Data leading the way on new products but also disrupting our clients business processes and business models. For many clients and big data fans, the temptation is to think big data and machine learning disrupt the need for strategy. Just throw the data in the lake and a bunch of programmers with machine learning fishing poles and we will be done. Here is a rapid-fire review of what really happens Use case 1: Use case 2: Use case 3: What did we learn working with these clients? Strategy still matters: Data is cheap; attention is not. While data and computational power are increasingly plentiful, people have limited attention and energy. Complexity can kill not so much in the model itself but in how it affects processes and decisions. Data is not so cheap after all. We continue to underappreciate data architecture, governance, and engineering. These frequently take up most of the effort required for analytics success. Winning with Big Data is often less about the latest technology platform but in our strategy, culture, organizational capabilities, the way we implement algorithms, how we make decisions with data, and the impacts these have on employees and customers.

How to Ruin your Business with Data Science & Machine Learning
5:50 pm - 6:20 pm
Ingo Mierswa, Founder & President At RapidMiner
How to Ruin your Business with Data Science & Machine Learning
5:50 pm - 6:20 pm
Ingo Mierswa, Founder & President At RapidMiner
Ingo Mierswa
Founder & President At RapidMiner
Data Science
Bovard
5:50 PM - 6:20 PM

Abstract:- Everyone talks about how machine learning will transform business forever and generate massive outcomes. However, it's surprisingly simple to draw completely wrong conclusions from statistical models, and correlation does not imply causation is just the tip of the iceberg. The trend of the democratization of data science further increases the risk for applying models in a wrong way. This session will discuss. How highly-correlated features can overshadow the patterns your machine learning model is supposed to find this leads to models which will perform worse in production than during model building. How incorrect cross-validation lead to over-optimistic estimations of your model accuracy, especially we will discuss the impact of data preprocessing on the accuracy of machine learning models. How feature engineering can lift simple models like linear regression to the accuracy of deep learning but comes with the advantages of understandability & robustness.

Weibull Analysis: Tableau + R Integration
6:30 pm - 7:00 pm
Monica Willbrand, Senior Business Consultant At Tableau Software
Weibull Analysis: Tableau + R Integration
6:30 pm - 7:00 pm
Monica Willbrand, Senior Business Consultant At Tableau Software
Monica Willbrand
Senior Business Consultant At Tableau Software
Big Data
Bovard
6:30 PM - 7:00 PM

Abstract:- Weibull reliability analysis predicts the life of products by fitting a distribution to a plot based on a population of units; multiple proprietary software applications are available to perform the analysis. The advent of Tableau + R Integration empowers Data Scientists and Reliability experts to make inferences drawn on populationsäó» failure characteristics by considering the ‘_- value of the distribution. With ‘_, we plot F(t), or unreliability over time, when leveraging Tableau + R Integration (Rscripts in Tableau calculated fields, pointing to R Server library for row-level execution). The Weibull analysis performed is superior to the Kaplan-Meiers method as it enables the more accurate Maximum Likelihood Estimate (MLE) curve fitting of plotted regression as opposed to Least Squares Estimate (LSE), which excludes R Integration and fails to precisely match parameters (shape, slope) that sophisticated existing reliability software packages produce. Application of Weibull for reliability analysis considers failure for given time in lifespan (t) when t= miles, cycles, hours, etc. The two-parameter distribution performed in this analysis includes beta and eta, or shape and scale parameters, respectively. Mean Time To Failure (MTTF) calculations are derived from these parameters as well. Variable Confidence Interval (CI) bands are used and can be adjusted using the interactive Tableau visualization. Industries utilizing Weibull analysis to plot Bathtub Curve assess the infant mortality, normal useful life, and end of life failures anticipated for a product (e.g. semiconductor chips, automotive parts, medical devices).

GFS 106
GFS 106
Crowd Surfing Tweets
11:00 am - 11:30 am
Kivanc Yazan, Software Engineer At ZipRecruiter
Crowd Surfing Tweets
11:00 am - 11:30 am
Kivanc Yazan, Software Engineer At ZipRecruiter
Kivanc Yazan
Software Engineer At ZipRecruiter
NoSQL
GFS106
11:00 AM - 11:30 AM

Abstract:- It's easy to collect millions of tweets, but not so easy to get right ones! During my senior year at college, we built a tool that lets you "surf" through Twitter. We were able to catch on-the-fly hashtags and add them into our search query, automagically. We tested it during elections and sporting events for tweets in both Turkish and English.

TBD by Sponsored Speaker 7
11:40 am - 12:10 pm
TBD by Sponsored Speaker 7
11:40 am - 12:10 pm
Sponsored

TBD

Extending Analytic Reach - From The Warehouse to The Data Lake
12:20 pm - 12:50 pm
Mike Limcaco, CTO At Agilisium Consulting
Extending Analytic Reach - From The Warehouse to The Data Lake
12:20 pm - 12:50 pm
Mike Limcaco, CTO At Agilisium Consulting
Mike Limcaco
CTO At Agilisium Consulting
Big Data
GFS106
12:20 PM - 12:50 PM

Abstract:- The data marts and warehouses we work with often require us to think about how to scope our analytic questions based on the finite amount of storage allocated to these enterprise components. With new innovations in the cloud space, we can leverage the near-infinite storage capacities of Data Lake object storage and use this as foundational source that can be combined with online data in the warehouse. In this talk we present reference architecture patterns based on Amazon Redshift Spectrum - a new technology enabling you to run MPP Warehouse SQL queries against exabytes of data in a backing object store. With Redshift Spectrum, customers can extend the analytic reach of their SQL interactions to push beyond data stored on local disks in the data warehouse to query vast amounts of unstructured data in the Amazon S3 Data Lake-- without having to load or transform any data.

Logitech Accelerates Cloud Analytics Using Data Virtualization
2:00 pm - 2:30 pm
Avinash Deshpande, Chief Software Architect At Logitech
Logitech Accelerates Cloud Analytics Using Data Virtualization
2:00 pm - 2:30 pm
Avinash Deshpande, Chief Software Architect At Logitech
Avinash Deshpande
Chief Software Architect At Logitech
NoSQL
GFS106

Abstract:- Many firms are adopting a cloud first strategy and are migrating their on-premises technologies to the cloud. Logitech is one of them. We have adopted the AWS platform and big data on the cloud for all of their analytical needs, including Amazon Redshift and S3. In this presentation, I will present: The business rationale for migrating to the cloud. How data virtualization enables the migration. Running data virtualization itself in the cloud.

TBD by Sponsored Speaker 5 [TBD]
2:40 pm - 3:10 pm
TBD by Sponsored Speaker 5 [TBD]
2:40 pm - 3:10 pm
Sponsored

TBD

Building Autopilots for Business: Leveraging Flight Science to create new Data Science Frameworks
3:20 pm - 3:50 pm
Emad Hasan, Chief Operating Officer & Co-Founder At Retina AI
Building Autopilots for Business: Leveraging Flight Science to create new Data Science Frameworks
3:20 pm - 3:50 pm
Emad Hasan, Chief Operating Officer & Co-Founder At Retina AI
Emad Hasan
Chief Operating Officer & Co-Founder At Retina AI
AI/ Machine Learning
GFS106
3:20 PM - 3:50 PM

Abstract:- We will discuss how leveraging Control System Theory, which informed and lead to the advent of flight control systems, is ushering in a new data-science framework for bringing automated insights to Enterprise. As organizations grow, managers start relying on data from several sources to make decisions concerning the organization and its customers. Organizations with a growing gap between desired results and actual results need a control system to better manage and predict results using their existing data. Control Theory is typically used in two forms: (1) Goal Seeking Model: For example autopilot guiding an aircraft from A to B. We will show how an organization looking to increase profits can be modeled as a Goal Seeking organization. (2) Disturbance Rejection Model; Sometimes seen in temperature control systems in buildings, in Enterprise an organization seeking to minimize costs, can be modeled using the Disturbance Rejection model. Both of these concepts are based on sound scientific principles that can be used to model and control businesses. The talk will include two real-world cases of how Emad Hasan translated these ideas into data science applications at Facebook and PayPal which powered executive decision making, as well as how they can now be used by data scientists and managers around the world to improve insights and cues for business executives.

Real-Time Analytics in Transactional Applications
4:30 pm - 5:00 pm
Brian Bulkowski, CTO And Founder At Aerospike
Real-Time Analytics in Transactional Applications
4:30 pm - 5:00 pm
Brian Bulkowski, CTO And Founder At Aerospike
Brian Bulkowski
CTO And Founder At Aerospike
NoSQL
GFS106
4:30 PM - 5:00 PM

Abstract:- BI and analytics are at the top of corporate agendas. Competition is intense, and, more than ever, organizations require fast access to insights about their customers, markets, and internal operations to make better decisionsäóîoften, in real time. Enterprises face challenges powering real-time business analytics and systems of engagement (SOEs). Analytic applications and SOEs need to be fast and consistent, but traditional database approaches, including RDBMS and first-generation NoSQL solutions, can be complex, a challenge to maintain, and costly. Companies should aim to simplify traditional systems and architectures while also reducing vendors. One way to do this is by embracing an emerging hybrid memory architecture, which removes an entire caching layer from your front-end application. This talk discusses real-world examples of implementing this pattern to improve application agility and reduce operational database spend.

Deep Learning Frameworks Using Spark on YARN
5:10 pm - 5:40 pm
Vartika Singh, Senior Solutions Architect At Cloudera
Deep Learning Frameworks Using Spark on YARN
5:10 pm - 5:40 pm
Vartika Singh, Senior Solutions Architect At Cloudera
Vartika Singh
Senior Solutions Architect At Cloudera
Hadoop/Spark/Kafka
GFS106
5:10 PM - 5:40 PM

Abstract:- Traditional machine learning and feature engineering algorithms are not efficient enough to extract complex and nonlinear patterns hallmarks of big data. Deep learning, on the other hand, helps translate the scale and complexity of the data into solutions like molecular interaction in drug design, the search for subatomic particles and automatic parsing of microscopic images. Co-locating a data processing pipeline with a deep learning framework makes data exploration/algorithm and model evolution much simpler, while streamlining data governance and lineage tracking into a more focused effort. In this talk, we will discuss and compare the different deep learning frameworks on Spark in a distributed mode, ease of integration with the Hadoop ecosystem, and relative comparisons in terms of feature parity.

Spark, ElasticSearch, and Murmur3 Hash
5:50 pm - 6:20 pm
Brian Kursar, VP, Data Intelligence At Warner Bros
Spark, ElasticSearch, and Murmur3 Hash
5:50 pm - 6:20 pm
Brian Kursar, VP, Data Intelligence At Warner Bros
Brian Kursar
VP, Data Intelligence At Warner Bros
Entertainment
GFS106
5:50 PM - 6:20 PM

Abstract:- How Warner Bros. is Using Elastic to Solve Entertainment and Media Problems at Scale Warner Bros. processes billions of records each day Globally between its web assets, digital content distribution, OTT streaming services, online and mobile games, technical operations, anti-piracy programs, social media, and retail point of sale transactions. Despite having large MPP clusters, a significant amount of dark data remained trapped in Web Logs. In this presentation, we will discuss how Warner Bros. leveraged the new Elastic 5 stack coupled with Apache Spark, to deliver scalable insights and new capabilities to support business needs.

Audio Beacons - The InAudible Bridge from Big Data and Content to Mobile Smartphone Consumers
6:30 pm - 7:00 pm
Tom Webster, CEO At TONE
Audio Beacons - The InAudible Bridge from Big Data and Content to Mobile Smartphone Consumers
6:30 pm - 7:00 pm
Tom Webster, CEO At TONE
Tom Webster
CEO At TONE
IoT
GFS106
6:30 PM - 7:00 PM

Abstract:- The Tone Knows is a Internet of Things Marketing and Adverting Vendor and Platform. We implement a next generation patent pending Internet of Things Advertising Technolgy using Audio Beacons. These Audio Beacons, unlike proximity beacons do not use bluetooth or wifi to transmit to the mobile phone. There is no hardware required. Here is a short demonstration of a Audio Beacon program we did with music artist Ariana Grande https://www.youtube.com/watch?v=rP36bCuA4kM In the demonstration, a high frequency tone is embedded inside the Music Video on the laptop. When the audio tone goes off, the mobile phone (which has opted in and have microphone turned on) can receive the audio tone and is sent a hyperlink to ecommerce, promotion, contest or anywhere else we sent it to. This is way to connect TV and TV Advertising, Radio and Radio Advertising, Youtube Video and Banner advertising to connect from content to mobile smartphone consumers.

SGM 123
SGM 123
Data Science: Good, Bad and Ugly
11:00 am - 11:30 am
Irina Kukuyeva, Senior Data Scientist At Dia &Co
Data Science: Good, Bad and Ugly
11:00 am - 11:30 am
Irina Kukuyeva, Senior Data Scientist At Dia &Co
Irina Kukuyeva
Senior Data Scientist At Dia &Co
Data Science
SGM123
11:00 AM - 11:30 AM

Abstract:- As a data scientist, I get to see a broad spectrum of the 'good', 'bad', and 'ugly' implementations of engineering and data practices. I'd be happy to share my tips and experiences with the broader community: the do's and don'ts of working with data in production, for collaboration, and for getting actionable insights.

Sponsored - Building Intelligent Applications with Cassandra, Spark and DataStax Enterprise Analytics
11:40 am - 12:10 pm
Jeff Carpenter, Technical Evangelist At DataStax
Sponsored - Building Intelligent Applications with Cassandra, Spark and DataStax Enterprise Analytics
11:40 am - 12:10 pm
Jeff Carpenter, Technical Evangelist At DataStax
Jeff Carpenter
Technical Evangelist At DataStax
NoSQL
SGM123
11:40 AM - 12:10 PM

Abstract:- Apache Cassandra is known as the go-to database for cloud applications requiring large amounts of data storage with elastic scalability across multiple data centers. Spark is an in-memory analytics framework that supports both realtime and batch processing, with extensions for streaming, machine learning, and SQL. Jeff Carpenter, Technical Evangelist at DataStax, will share how DataStax Enterprise puts these powerful technologies together to solve common use cases in domains including entertainment and IoT. We’ll explore architectures for intelligent applications that leverage DSE to provide real-time operational analytics.

The AI Takeover in Hollywood
12:20 pm - 12:50 pm
Yves Bergquist, Director, Data & Analytics Program At USC/ETC
The AI Takeover in Hollywood
12:20 pm - 12:50 pm
Yves Bergquist, Director, Data & Analytics Program At USC/ETC
Yves Bergquist
Director, Data & Analytics Program At USC/ETC
Entertainment
SGM123
12:20 PM - 12:50 PM

Abstract:- As the entertainment industry faces a landscape of exponential opportunities and threats, it is quietly turning to artificial intelligence to manage risk, develop operational efficiencies, and make more data-driven decisions. From developing cognitive solutions to assess why we think certain films and characters are more interesting than others, to isolating granular, scene-level story and character mechanics that drive better box office returns, Hollywood has fully caught up on other industries in leveraging high-end analytics methods and tools. _As the director of the Data & Analytics Project at USC's prestigious Entertainment Technology Center (created by George Lucas in 1993), Yves Bergquist sits at the center of this revolution. He and his team are developing next-generation AI tools and methods that are being deployed throughout the entertainment industry. Because his research is funded by all 6 Hollywood studios, and he personally answers to all the CTOs of those studios, Yves has unique and powerful insight into how Hollywood is quietly using machine intelligence to take its hit-making game to the next level. What the audience will learn: the audience will go behind the scenes to discover how precisely Hollywood studios are using data, analytics and AI to make better development, production and distribution decisions. Yves will draw from his and his team's research and use case studies to lift the veil on how AI, game theory, and neuroscience are transforming audience intelligence, film development, and distribution strategies.

A Gentle Introduction to GPU Computing
2:00 pm - 2:30 pm
Armen Donigian, Data Science Engineer At Zest Finance
A Gentle Introduction to GPU Computing
2:00 pm - 2:30 pm
Armen Donigian, Data Science Engineer At Zest Finance
Armen Donigian
Data Science Engineer At Zest Finance
Use Case Driven
SGM123
2:00 PM - 2:30 PM

Abstract:- As data science continues to mature and evolve, the demand for more computationally extensive machines is rising. GPU Computing provides the core capabilities that data scientists today are looking for, and when implemented effectively, it accelerates deep learning, analytics and other sophisticated engineering applications. During this talk, Armen Donigian, Data Science Engineer at ZestFinance, will introduce the GPU programming model and parallel computing patterns, as well as practical implications of GPU computing, such as how to accelerate applications on a GPU with CUDA (C++/Python), GPU memory optimizations and multi GPU programming with MPI and OpenACC. As an example of how GPU programming can be implemented in real-life business models, Armen will present how ZestFinance has successfully tapped into the power of GPU Computing for the deep learning algorithm behind its new platform, Zest Automated Machine Learning platform (ZAML). Currently, ZAML is used by major tech, credit and auto companies to successfully apply cutting-edge machine learning models to their toughest credit decisioning problems. ZAML leverages GPU Computing for data parallelism, model parallelism and training parallelism.

Sponsored - What's new in Hortonworks DataFlow 3.0
2:40 pm - 3:10 pm
Andrew Psaltis, HDF/IoT Product Solutions Architect At Hortonworks
Sponsored - What's new in Hortonworks DataFlow 3.0
2:40 pm - 3:10 pm
Andrew Psaltis, HDF/IoT Product Solutions Architect At Hortonworks
Andrew Psaltis
HDF/IoT Product Solutions Architect At Hortonworks
IoT
SGM123
2:40 PM - 3:10 PM

Abstract:- Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.0.

Delivering Quality Open Data
3:20 pm - 3:50 pm
Chelsea Ursaner, Solutions Architect At Office Of LA Mayor Eric Garcetti
Delivering Quality Open Data
3:20 pm - 3:50 pm
Chelsea Ursaner, Solutions Architect At Office Of LA Mayor Eric Garcetti
Chelsea Ursaner
Solutions Architect At Office Of LA Mayor Eric Garcetti
Use Case Driven
SGM123
3:20 PM - 3:50 PM

Abstract:- The value of data is exponentially related to the number of people and applications that have access to it. The City of Los Angeles embraces this philosophy and is committed to opening as much of its data as it can in order to stimulate innovation, collaboration, and informed discourse. This presentation will be a review of what you can find and do on our open data portals as well as our strategy for delivering the best open data program in the nation.

Waze Carpool: A little selfless and a little selfish
4:30 pm - 5:00 pm
Eric Ruiz, LatAm Marketing Manager At Google Waze
Waze Carpool: A little selfless and a little selfish
4:30 pm - 5:00 pm
Eric Ruiz, LatAm Marketing Manager At Google Waze
Eric Ruiz
LatAm Marketing Manager At Google Waze
IoT
SGM123
4:30 PM - 5:00 PM

Abstract:- Waze Carpool is the evolution of the Waze mission. If at first we wanted to help you save time by finding the fastest route to your destination, now we want to help you avoid traffic--by eliminating it all together. Traffic is a simple problem. There are too many cars on the road with too many empty seats.

Deriving Conversational Insight by Learning Emoji Representations
5:10 pm - 5:40 pm
Jeff Weintraub, Vice President At TheAmplify
Deriving Conversational Insight by Learning Emoji Representations
5:10 pm - 5:40 pm
Jeff Weintraub, Vice President At TheAmplify
Jeff Weintraub
Vice President At TheAmplify
Data Science
SGM123
5:10 PM - 5:40 PM

Abstract:- It is a rare occurrence to observe the rise of a new language amongst a population. It is an even more rare occurrence to observe the adoption of such a language on a global scale. Since the introduction of the emoji keyboard on iOS in 2011, the use of emojis in textual communication has steadily grown into a common vernacular on social media. As of April 2015, Instagram reported that nearly half of all text contained emojis and, in some countries, over 60% of texts contained emoji characters. For power users of social media as well as for marketers looking for audiences on these platforms, it is becoming increasingly imperative to capture emoji data and derive insight from its use; to better understand what intent or meaning the usage carries in the conversation. Jeff Weintraub, VP of Technology at theAmplify, a creative Brandtech Influencer Service and a subsidiary of You & Mr Jones, the World's First Brandtech Group, will briefly summarize the data science behind learning emoji representations and also present recent trends in emoji usage within the context of advertising and branded marketing campaigns on social media.

How OpenTable uses Big Data to impact growth
5:50 pm - 6:20 pm
Raman Marya, Director, Data Engineering And Analytics At OpenTable
How OpenTable uses Big Data to impact growth
5:50 pm - 6:20 pm
Raman Marya, Director, Data Engineering And Analytics At OpenTable
Raman Marya
Director, Data Engineering And Analytics At OpenTable
Big Data
SGM123
5:50 PM - 6:20 PM

Abstract:- We have created variety of Analytics Solutions combining data from our Data Lake with Traditional DW. Data API's which are fed into product for improving conversions, Churn prediction alogrithm to help account managers focus on high risk customers, using analytics as an edge to empower sales team to win prospective customers.

Deep Learning for Natural Language Processing
6:30 pm - 7:00 pm
Roopal Garg, Data Scientist At GumGum
Deep Learning for Natural Language Processing
6:30 pm - 7:00 pm
Roopal Garg, Data Scientist At GumGum
Roopal Garg
Data Scientist At GumGum
Data Science
SGM123
6:30 PM - 7:00 PM

Abstract:- The talk will focus on how Neural Networks are applied in the field of NLP for tasks like classification. Building blocks like Word Embeddings, Recurrent NN, LSTM, GRU, Convolutional NN, Sentence Representation and how they are applied to a piece of text in Tensorflow will be covered. These building blocks can be stacked together in various ways to form deeper network architectures. We will discuss one such architecture which is used within GumGum Inc to do Sentiment Analysis on web pages using NN in Tensorflow.

THH101
THH101
Using machine learning to optimize marketing ROI at Honest Company
11:00 am - 11:30 am
Roozbeh Davari, Data Scientist At The Honest Company
Using machine learning to optimize marketing ROI at Honest Company
11:00 am - 11:30 am
Roozbeh Davari, Data Scientist At The Honest Company
Roozbeh Davari
Data Scientist At The Honest Company
AI/ Machine Learning
THH101
11:00 AM - 11:30 AM

Abstract:- The Honest Company is one of the leading national e-commerce brands, focused on promoting healthy and happy lives. One core challenge we face is measuring ROI across the numerous marketing acquisition channels and in turn, optimizing our budget allocation. Luckily, we have large amounts of data, both structured and unstructured, which is used to learn patterns and insights about how we acquire new customers. We leverage this data to build machine learning models for smart customer segmentation, which helps the acquisition team derive maximum ROI out of every dollar spent. In this talk, we will touch on some of our machine learning approaches and how we leverage data to predict attributes like customer lifetime value and customer churn rates, which are then used to optimize spend allocation.

Sponsored - Machine Learning on Distributed Systems
11:40 am - 12:10 pm
Josh Poduska, Senior Data Scientist, Big Data Group At HPE Vertica
Sponsored - Machine Learning on Distributed Systems
11:40 am - 12:10 pm
Josh Poduska, Senior Data Scientist, Big Data Group At HPE Vertica
Josh Poduska
Senior Data Scientist, Big Data Group At HPE Vertica
Data Science
THH101
11:40 AM - 12:10 PM

Abstract:- Most real-world data science workflows require more than multiple cores on a single server to meet scale and speed demands, but there is a general lack of understanding when it comes to what machine learning on distributed systems looks like in practice. Gartner and Forrester do not consider distributed execution when they score advanced analytics software solutions. Many formal machine learning training occurs on single node machines with non-distributed algorithms. In this talk we discuss why an understanding of distributed architectures is important for anyone in the analytical sciences. We will cover the current distributed machine learning ecosystem. We will review common pitfalls when performing machine learning at scale. We will discuss architectural considerations for a machine learning program such as the role of storage and compute and under what circumstances they should be combined or separated.

Optimizing Online Advertising With Data Science
12:20 pm - 12:50 pm
Andrea Trevino, Lead Data Scientist At DataScience.com
Optimizing Online Advertising With Data Science
12:20 pm - 12:50 pm
Andrea Trevino, Lead Data Scientist At DataScience.com
Andrea Trevino
Lead Data Scientist At DataScience.com
Use Case Driven
THH101
12:20 PM - 12:50 PM

Abstract:- Optimizing online display advertising is a complicated task that often requires a combination of domain knowledge and intuition on the part of a marketer. But with so much advertising occurring within an algorithmic marketplace, at large scale, and under deadline, getting the most of your advertising dollars can be optimized with data science. In this talk, we delve into a custom data science solution for optimizing Facebook advertising campaigns, and how that solution ultimately saves time and boosts profits far beyond what intuition alone can achieve.

Machine Learning in Healthcare
2:00 pm - 2:30 pm
Mehrdad Yazdani, Data Scientist At Open Medicine Institute & UCSD
Machine Learning in Healthcare
2:00 pm - 2:30 pm
Mehrdad Yazdani, Data Scientist At Open Medicine Institute & UCSD
Mehrdad Yazdani
Data Scientist At Open Medicine Institute & UCSD
Data Science
THH101
2:00 PM - 2:30 PM

Abstract:- Using Machine Learning to Identify Major Shifts in Human Gut MicrobiomeProtein Family Abundance in Disease Inflammatory Bowel Disease (IBD) is an autoimmune condition that is observed to be associated with major alterations in the gut microbiome taxonomic composition. Here we classify major changes in microbiome protein family abundances between healthy subjects and IBD patients. We use machine learning to analyze results obtained previously from computing relative abundance of ~10,000 KEGG orthologous protein families in the gut microbiome of a set of healthy individuals and IBD patients. We develop a machine learning pipeline, involving the Kolomogorv-Smirnov test, to identify the 100 most statistically significant entries in the KEGG database. Then we use these 100 as a training set for a Random Forest classifier to determine ~5% the KEGGs which are best at separating disease and healthy states. Lastly, we developed a Natural Language Processing classifier of the KEGG description files to predict KEGG relative over- or under- abundance. As we expand our analysis from 10,000 KEGG protein families to one million proteins identified in the gut microbiome, scalable methods for quickly identifying such anomalies between health and disease states will be increasingly valuable for biological interpretation of sequence data.

Sponsored - Get Insights from the Cloud – without having to burn a hole in your pocket!
2:40 pm - 3:10 pm
Aparajeeta Das, Co-Founder & COO At Third Eye Consulting
Sponsored - Get Insights from the Cloud – without having to burn a hole in your pocket!
2:40 pm - 3:10 pm
Aparajeeta Das, Co-Founder & COO At Third Eye Consulting
Aparajeeta Das
Co-Founder & COO At Third Eye Consulting
Use Case Driven
THH101
2:40 PM - 3:10 PM

Abstract:- These days, for successfully running any business, it’s imperative to derive insights from their data. This is now a matter of survival in this cut-throat competitive market where Cloud based innovations have significantly lowered the barriers to entry. While larger enterprises with deep pockets can afford to build sophisticated analytical solutions, the SMBs find it very difficult to build any. Even if SMBs build an analytical system with a low budget, they still get burnt as they produce low quality metrics and eventually end up becoming a cost-center. This presentation will showcase an end-to-end use case for SMBs focused in the Retail industry to derive deep insights by just a few clicks, without the need to buy any expensive hardware, software or hire expensive technical personnel.

Engineering a Flexible Recommender System for The L.A. Times
3:20 pm - 3:50 pm
Matt Chapman, Senior Data Engineer At Tronc Inc.
Engineering a Flexible Recommender System for The L.A. Times
3:20 pm - 3:50 pm
Matt Chapman, Senior Data Engineer At Tronc Inc.
Matt Chapman
Senior Data Engineer At Tronc Inc.
Entertainment
THH101
3:20 PM - 3:50 PM

Abstract:- A walk through of the architecture designed at Tronc for A/B Testing multiple algorithms for delivering personalized content recommendations to 60 million readers a month.

Diversity in Data Science: why it's important and challenging
4:30 pm - 5:00 pm
Noelle Saldana, Principal Data Scientist At Pivotal
Diversity in Data Science: why it's important and challenging
4:30 pm - 5:00 pm
Noelle Saldana, Principal Data Scientist At Pivotal
Noelle Saldana
Principal Data Scientist At Pivotal
Use Case Driven
THH101
4:30 PM - 5:00 PM

Abstract:- Data Scientists come in all shapes, sizes, and personalities, from perhaps a more diverse set of academic and industrial backgrounds than other jobs in tech. This talk explores ways to hire a team with complementary skill sets and backgrounds, the obvious and not-so-obvious benefits of diversity, and challenges teams face when learning to work together.

Panel: Using IOT to Drive Productivity
5:10 pm - 5:40 pm
Stuart McCormick, Americas Digital Services Leader At Honeywell; John Sullivan, Director Of Innovation At SAP NA; Suresh Paulraj, Cloud Data Solution Manager/Architect At Microsoft
Panel: Using IOT to Drive Productivity
5:10 pm - 5:40 pm
Stuart McCormick, Americas Digital Services Leader At Honeywell; John Sullivan, Director Of Innovation At SAP NA; Suresh Paulraj, Cloud Data Solution Manager/Architect At Microsoft
Stuart McCormick
John Sullivan; Suresh Paulraj
Americas Digital Services Leader At Honeywell
Director Of Innovation At SAP NA; Cloud Data Solution Manager/Architect At Microsoft
IoT
THH101
5:10 PM - 5:40 PM

Moderator: Stuart McCormick. Panelists: John Sullivan and Suresh Paulraj.

How AI Is Transforming B2B Sales & Marketing
5:50 pm - 6:20 pm
Olin Hyde, Founder & CEO At Leadcrunch AI
How AI Is Transforming B2B Sales & Marketing
5:50 pm - 6:20 pm
Olin Hyde, Founder & CEO At Leadcrunch AI
Olin Hyde
Founder & CEO At Leadcrunch AI
AI/ Machine Learning
THH101
5:50 PM - 6:20 PM

Abstract:- Artificial intelligence blurs the lines between sales and marketing by enabling humans to leverage the power of Big Data to command and control every customer's journey. A new category of technology called "intelligent demand generation" enables marketers to explain and predict buyer behavior with unprecedented precision and speed. These capabilities reframe how companies go to market by enabling microtargeting of customers with context-specific content marketing. Early adopters of intelligent demand generation technologies are realizing more than 500% return on investment within 2 months. Three-time AI startup founder and CEO of LeadCrunch describes how his company developed military targeting technology then modified it to make commercial sales teams more efficient.

Artisanal Data
6:30 pm - 7:00 pm
Ben Coppersmith, Data Engineering Manager At Factual
Artisanal Data
6:30 pm - 7:00 pm
Ben Coppersmith, Data Engineering Manager At Factual
Ben Coppersmith
Data Engineering Manager At Factual
Use Case Driven
THH101
6:30 PM - 7:00 PM

Abstract:- We have lots of data at Factual. But to solve some of our harder problems, we need to get down and dirty with our data -- to examine, evaluate, and experience it (sometimes even smell it). This talk attempts to re-brand this kind of work with the new, alternative buzzword, "Artisanal Data". I review the "Artisanal Data" technologies and techniques we use at Factual, including how we document experiments so that they get read, evaluate failure modes and judge successes, and keep our annotation data as accurate as possible. With the right statistical precautions, Artisanal Data can use be used to more effectively and emotionally communicate impact of our data.

THH116
THH116
Startup Showcase in partnership with TenOneTen Ventures
2:30 pm - 3:30 pm
Startup Showcase Competition in partnership with TenOneTen Ventures
Startup Showcase in partnership with TenOneTen Ventures
2:30 pm - 3:30 pm
Startup Showcase Competition in partnership with TenOneTen Ventures
Startup Showcase
THH116
2:30 PM - 3:30 PM

This year we are excited to add something new to Big Data Day LA. In partnership with TenOneTen Ventures, we are bringing together some of the best data driven startups in Southern California! Five startups will have the opportunity to pitch to a panel of judges that range from VCs to data experts. The winner will receive a $1500 cash prize, $1,000 in MongoDB Atlas DBaaS credit and 3 strategy sessions with VCs.

THH 201
THH 201
Big Data on The Rise: Views of Emerging Trends & Predictions from real life end-users
11:00 am - 11:30 am
Roman Shaposhnik, VP Technology At ODPi
Big Data on The Rise: Views of Emerging Trends & Predictions from real life end-users
11:00 am - 11:30 am
Roman Shaposhnik, VP Technology At ODPi
Roman Shaposhnik
VP Technology At ODPi
Use Case Driven
THH201
11:00 AM - 11:30 AM

Abstract:- There are some key trends that are emerging in 2017 within the Hadoop and Big Data ecosystem, which center around the increase of the use of cloud. These trends are the underpinning for a larger shift toward purpose-driven products positioned as the core of an organization's data strategy. Certainly there are examples of this in early adopters that are mature in their deployments, but what about those more traditional end user organizations in the midst of a digital transformation? How is the relationship between IT and the needs of the growing data science align for business development? What are their views on these trends? How are their organizations reconciling the needs and desires to pave a path forward? In this session, Roman will present ODPi's findings and end-user views of Big Data trends based on data from the ODPi End User Advisory Board (TAB). Audiences will get real end-user perspectives from companies such as GE about how they are using Big Data tools, challenges they face and where they are looking to focus investments - all from a vendor-neutral viewpoint.

Sponsored - Build it…will they come
11:40 am - 12:10 pm
Shawn Trainer, IM Technical Lead At Aviana Global
Sponsored - Build it…will they come
11:40 am - 12:10 pm
Shawn Trainer, IM Technical Lead At Aviana Global
Shawn Trainer
IM Technical Lead At Aviana Global
Big Data
THH201
11:40 AM - 12:10 PM

Abstract:- The truth about enabling self-service (and why you need it) Data is growing astronomically, historically and in real-time. So is the need for exploration and discovery. One size doesn’t fit all. We’ll be covering how to efficiently deliver information on-demand and promote self-service adoption with the right data platform.

Disrupting Corporates with AI
12:20 pm - 12:50 pm
Hicham Mhanna, Vice President Of Engineering At BCG Digital Ventures
Disrupting Corporates with AI
12:20 pm - 12:50 pm
Hicham Mhanna, Vice President Of Engineering At BCG Digital Ventures
Hicham Mhanna
Vice President Of Engineering At BCG Digital Ventures
AI/ Machine Learning
THH201
12:20 PM - 12:50 PM

Abstract:- Discussion of various applied cases of artificial intelligence solutions stemming from our venture work across verticals including consumer, financial and media.

Building the modern data platform
2:00 pm - 2:30 pm
Ashish Thusoo, Co-Founder/CEO At Qubole; David Hsieh, SVP Marketing At Qubole
Building the modern data platform
2:00 pm - 2:30 pm
Ashish Thusoo, Co-Founder/CEO At Qubole; David Hsieh, SVP Marketing At Qubole
Ashish Thusoo
David Hsieh
Co-Founder/CEO At Qubole
SVP Marketing At Qubole
Big Data
THH201
2:00 PM - 2:30 PM

Abstract:- As IT evolves from a cost center to a true nexus of business innovation, data engineers, platform engineers and database admins need to build the enterprise of tomorrow. One that is scalable, and built on a totally self-service infrastructure. Having an agile, open and intelligent big data platform is key to this transformation. Join Qubole for an overview of the 5 stages of transformation that companies need to follow. Discover how to create a data-driven culture. Hear from the co-author of Apache Hive as he shares how Facebook and others became data-insights driven.

TBD by Sponsored Speaker 2 [TBD]
2:40 pm - 3:10 pm
TBD by Sponsored Speaker 2 [TBD]
2:40 pm - 3:10 pm
Sponsored

TBD

Probabilistic programming products
3:20 pm - 3:50 pm
Michael Lee Williams, Director Of Research At Fast Forward Labs
Probabilistic programming products
3:20 pm - 3:50 pm
Michael Lee Williams, Director Of Research At Fast Forward Labs
Michael Lee Williams
Director Of Research At Fast Forward Labs
Data Science
THH201
3:20 PM - 3:50 PM

Abstract:- Algorithmic innovations like NUTS and ADVI, and their inclusion in end user probabilistic programming systems such as PyMC3 and Stan, have made Bayesian inference a more robust, practical and computationally affordable approach. I will review inference and the algorithmic options, before describing two prototypes that depend on these innovations: one that supports decisions about consumer loans and one that models the future of the NYC real estate market. These prototypes highlight the advantages and use cases of the Bayesian approach, which include domains where data is scarce, where prior institutional knowledge is important, and where quantifying risk is crucial. Finally I'll touch on some of the engineering and UX challenges of using PyMC3 and Stan models not only for offline tasks like natural science and business intelligence, but in live end-user products.

Big data and health sciences: Machine learning applications in chronic and acute upper respiratory illness
4:30 pm - 5:00 pm
Huiyu Deng, Research Assistant At USC
Big data and health sciences: Machine learning applications in chronic and acute upper respiratory illness
4:30 pm - 5:00 pm
Huiyu Deng, Research Assistant At USC
Huiyu Deng
Research Assistant At USC
AI/ Machine Learning
THH201
4:30 PM - 5:00 PM

Abstract:- Big data has become the new hot topic in recent years. It promotes the understanding of the exploit of data and directs the decision guidance in many sectors. The health science field is also shaped by the innovative idea of big data application. Our study group from the department of preventive medicine of the Keck school of medicine of the University of Southern California aims to build a big data architecture that combines and analyzes data of people from difference sources and provide health related assessments back to them. Specifically, ecological momentary assessments (EMAs), electronic medical records (EMRs), and real-time air quality monitor data of children with pre-existing asthma diagnosis are collected and fed into the machine learning models. Asthma exacerbation alert is generated and delivered back to the children before it happens. The machine learning model was tested and built in a similar study. The study population consists of children from a cohort of the prospective, population-based Children's Health Study followed from 2003-2012 in 13 Southern California communities. Potential risk factors were grouped into five broad categories: sociodemographic factors, indoor/home exposures, traffic/air pollution exposures, symptoms/medication use, and asthma/allergy status. The outcome of interest, assessed via annual questionnaire, was the presence of bronchitic symptoms over the prior 12 months. A gradient boosting model (GBM) was trained on data consisting of one observation per participant in a random study year, for a randomly selected half of the study participants. The model was validated using hold-out test data obtained in two complementary approaches: (within-participant) a random (later) year in the same participants and (across-participant) a random year in participants not included in the training data. The predictive ability of risk factor groupings was evaluated using the area under receiver operating characteristic curve (AUC) and accuracy. The predictive ability of individual risk factors was evaluated using the relative variable importance. Graphical visualization of the predictor-outcome relationship was displayed using partial dependency plots. Interaction effects were identified using the H-statistic. Gradient boosting model offers a novel approach to better understand predictive factors for chronic upper respiratory illness such as bronchitic symptoms.

Big Video Game Data: Leveraging Large Scale Datasets to Make the Most of In-game Decisions
5:10 pm - 5:40 pm
Dylan Rogerson, Senior Data Scientist At Activision
Big Video Game Data: Leveraging Large Scale Datasets to Make the Most of In-game Decisions
5:10 pm - 5:40 pm
Dylan Rogerson, Senior Data Scientist At Activision
Dylan Rogerson
Senior Data Scientist At Activision
Entertainment
THH201
5:10 PM - 5:40 PM

Abstract:- A colleague of mine once asked Why would you ever need over 100k observations for ? It's surprising that many models we develop don't need much data to reach a good level of accuracy. In this talk we'll discuss how Activision leverages large datasets and feature space from the Call of Duty Series to build complex models. We'll also talk about how to transfer these learnings to more digestible simple models and how accuracy in these models translates to usable in-game action. Finally we'll showcase some of our model development pipeline and thoughts on when you really do need millions or billions of data points to make substantial improvements to the game.

Real Time Processing Using Twitter Heron
5:50 pm - 6:20 pm
Karthik Ramasamy, Co-Founder At Streamlio
Real Time Processing Using Twitter Heron
5:50 pm - 6:20 pm
Karthik Ramasamy, Co-Founder At Streamlio
Karthik Ramasamy
Co-Founder At Streamlio
IoT
THH201
5:50 PM - 6:20 PM

Abstract:- Today's enterprises are not only producing data in high volume but also at high velocity. With velocity comes the need to process the data in real time. To meet the real time needs, we developed and deployed Heron, the next generation streaming engine at Twitter. Heron processes billions and billions of events per day at Twitter and has been in production for nearly 3 years. Heron provides unparalleled performance at large scale and has been successfully meeting Twitter's strict performance requirements for various streaming and iOT applications. Heron is a open source project with several major contributors from various institutions. As the project, we identified and implemented several optimizations that improved throughput by additional 5x and further reduce latency by 50-60%. In this talk, we will describe Heron in detail, how the detailed profiling indicated the performance bottleneck areas such as multiple serializations/deserialization and immutable data structures. After mitigating these costs, we were able to show much higher throughput and latencies as low as 12ms.

Building streaming data applications using Kafka*[Connect + Core + Streams]
6:30 pm - 7:00 pm
Slim Baltagi, Big Data Practice Director At Advanced Analytics LLC
Building streaming data applications using Kafka*[Connect + Core + Streams]
6:30 pm - 7:00 pm
Slim Baltagi, Big Data Practice Director At Advanced Analytics LLC
Slim Baltagi
Big Data Practice Director At Advanced Analytics LLC
Hadoop/Spark/Kafka
THH201
6:30 PM - 7:00 PM

Abstract:- Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing. In this talk you will learn more about: A quick introduction to Kafka Core, Kafka Connect and Kafka Streams through code examples, key concepts and key features. A reference architecture for building such Kafka-based streaming data applications. A demo of an end-to-end Kafka-based streaming data application.

THH202
THH202
Secure cloud environment for Data Science and Analytics
11:00 am - 11:30 am
Konstantin Boudnik, Chief Technologist At EPAM Systems
Secure cloud environment for Data Science and Analytics
11:00 am - 11:30 am
Konstantin Boudnik, Chief Technologist At EPAM Systems
Konstantin Boudnik
Chief Technologist At EPAM Systems
Big Data
THH202
11:00 AM - 11:30 AM

Abstract:- Secure concurrent access to the data from on-demand compute clusters is a huge organizational and technical challenge. A few people possess the skills, experience and enough of the security expertise to create and control a clean cloud environment equipped with the state-of-the-art data science, analytic and visualization technologies. Commercial products are often too late with features, deliver unknown quality and security guarantees, and carry a hefty license fees. Developed by EPAM engineers and publicly available under ALv2, DLab framework addresses all these concerns by providing an of the shelf, simple to use platform. DLab allows anyone to setup a completely secure cloud environment empowered with data science notebook software, scaleable cluster solution and powerful compute engine based on Apache Spark.

TBD by Sponsored Speaker 3
11:40 am - 12:10 pm
TBD by Sponsored Speaker 3
11:40 am - 12:10 pm
Sponsored

TBD

Operationalizing Data Science with Apache Spark
12:20 pm - 12:50 pm
Lawrence Spracklen, VP Of Engineering At Alpine Data
Operationalizing Data Science with Apache Spark
12:20 pm - 12:50 pm
Lawrence Spracklen, VP Of Engineering At Alpine Data
Lawrence Spracklen
VP Of Engineering At Alpine Data
Hadoop/Spark/Kafka
THH202
12:20 PM - 12:50 PM

Abstract:- Today, in many data science projects, the sole focus is the complexity of the algorithms being used to address the data problem. While this is a critical consideration, without consideration of how the insights developed can be disseminated through the broader enterprise, many end up dying on the vine. This presentation will highlight not only that turnkey model operationalization strategy is critical to the success of enterprise data science projects, but how this can be achieved using Spark. Today Spark enables data scientists to perform sophisticated analyses using complex machine learning algorithms. Even when the size of the datasets are measured in Terabytes, Spark provides a broad selection of machine learning algorithms that can scale effortlessly. However, the current process for the business to leverage the results of these analyses is far less sophisticated. Indeed, results are frequently communicated by powerpoint presentation, rather than a turn-key solution for deploying improved models into production. In this session, we discuss the current challenges associated with operationalizing these results. We discuss the challenges associated with turnkey model operationalization, including the shortcomings of model serialization standards such as PMML for expressing the complex pre- and post- data processing that is critical to effortless operationalization. Finally, we discuss in detail the potential for turnkey model operationalization with the emerging PFA standard, and highlight how the use of PFA can be achieved using Spark, including how PFA model scoring can be supported using Spark streaming, and our efforts to drive support for PFA model export into MLlib.

Application of Data Science in Speciality Retail
2:00 pm - 2:30 pm
Manas Bhat, Director - Finance And Strategy At Guitar Center
Application of Data Science in Speciality Retail
2:00 pm - 2:30 pm
Manas Bhat, Director - Finance And Strategy At Guitar Center
Manas Bhat
Director - Finance And Strategy At Guitar Center
Entertainment
THH202
2:00 PM - 2:30 PM

Abstract:- Guitar Center is the largest retailer of musical instruments in the US with over 275 stores and an annual revenue of over $2.1B. My talk will focus on how data is leveraged here to drive optimal decision making. We have a data warehouse that collects information on transactions, traffic, products, inventory, customer and many more from stores and online. During the talk, I will walk through various insights that we have derived by analyzing the data. In one project, we blended Experian provided Household level data with our internal data to build customer profiles on purchasers of high-end guitars. We used this information with drive-time analysis to pick stores to build Platinum rooms exclusively dedicates to high-end guitars. I will also run through how we use extensive experimentation to test strategies before a chain wide roll out. We pick a few stores to pilot, use KNN to select like stores and perform a pre/post analysis to evaluate lift and its statistical significance.

TBD by Sponsored Speaker 4 [TBD]
2:40 pm - 3:10 pm
TBD by Sponsored Speaker 4 [TBD]
2:40 pm - 3:10 pm
Sponsored

TBD

A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets
3:20 pm - 3:50 pm
Jules Damji, Spark Community Evangelist At Databricks
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets
3:20 pm - 3:50 pm
Jules Damji, Spark Community Evangelist At Databricks
Jules Damji
Spark Community Evangelist At Databricks
Hadoop/Spark/Kafka
THH202
3:20 PM - 3:50 PM

Abstract:- Of all the developers delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs - RDDs, DataFrames, and Datasets available in Apache Spark 2.x. In particular, I will emphasize why and when you should use each set as best practices, outline its performance and optimization benefits, and underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them.

VariantSpark a library for genomics
4:30 pm - 5:00 pm
Lynn Langit, Big Data & Cloud Architect At Lynn Langit Consulting
VariantSpark a library for genomics
4:30 pm - 5:00 pm
Lynn Langit, Big Data & Cloud Architect At Lynn Langit Consulting
Lynn Langit
Big Data & Cloud Architect At Lynn Langit Consulting
Hadoop/Spark/Kafka
THH202
4:30 PM - 5:00 PM

Abstract:- See the VariantSpark library in action on a Databricks Jupyter notebook.

Generalized B2B Machine Learning
5:10 pm - 5:40 pm
Andrew Waage, Co-Founder At Retention Science
Generalized B2B Machine Learning
5:10 pm - 5:40 pm
Andrew Waage, Co-Founder At Retention Science
Andrew Waage
Co-Founder At Retention Science
AI/ Machine Learning
THH202
5:10 PM - 5:40 PM

Abstract:- In this talk, we propose a generalized machine learning framework for e-commerce businesses. The framework is responsible for over 30 different user-level predictions including lifetime value, recommendations, churn predictions, engagement and lead scoring. These predictions provide a vital layer of intelligence for a digital marketer. Kinesis is used to capture browsing information from over 120M users across 100 companies (both in-app and web). A data processing and feature engineering layer is build on Apache Spark. These features provide inputs to predictive models for business applications. Different models each for Churn, Lifetime value, Product recommendation and search are written on Spark. These models can be plugged into any marketing campaign for any integrated e-commerce company leading to a generalized system. We finally present a monitoring system for machine learning called RS Sauron. This system provides more than 200 objective metrics measuring the health of predictive models, and depicts KPIs for model accuracy in a continual setting.

Democratizing Hedge Funds
5:50 pm - 6:20 pm
Brinkley Warren, Chief Marketing Officer & Co-Founder At Quantiacs
Democratizing Hedge Funds
5:50 pm - 6:20 pm
Brinkley Warren, Chief Marketing Officer & Co-Founder At Quantiacs
Brinkley Warren
Chief Marketing Officer & Co-Founder At Quantiacs
Use Case Driven
THH202
5:50 PM - 6:20 PM

Abstract:- Learn how to leverage your data science skills to earn a fortune as a freelance quant in your spare time. At Quantiacs, we run the world's largest quant finance algorithm competition in the world, and host the world's only marketplace for quantitative trading algorithms. We provide 26 years of free financial data, an open-source toolkit in multiple languages, and access to investment capital. The audience will learn about the future of quantitative finance and how to cash in.

Quark: A Scala DSL For Data Processing & Analytics
6:30 pm - 7:00 pm
John De Goes, CTO At SlamData
Quark: A Scala DSL For Data Processing & Analytics
6:30 pm - 7:00 pm
John De Goes, CTO At SlamData
John De Goes
CTO At SlamData
NoSQL
THH202
6:30 PM - 7:00 PM

Abstract:- Quark is a new Scala DSL for performing high-performance data processing and analytics on sources of structured and semi-structured data. Built with an advanced optimizing analytics compiler, Quark can push computation down to any supported data source, offering performance typically seen only with hand-written code. Quark's type-safe API allows developers to create correct pipelines and analytics, while the flexibility provides an ability to directly manipulate semi-structured data formats like JSON and XML. Quark has native support for MongoDB, MarkLogic, Couchbase, and any data source with a Spark connector.

THH301
THH301
Big Data at Blizzard
11:00 am - 11:30 am
Ted Malaska, Technical Group Architect At Blizzard Entertainment
Big Data at Blizzard
11:00 am - 11:30 am
Ted Malaska, Technical Group Architect At Blizzard Entertainment
Ted Malaska
Technical Group Architect At Blizzard Entertainment
Entertainment
THH301
11:00 AM - 11:30 AM

Abstract:- Blizzard the creating of leading games Hearthstone, World of Warcraft, Overwatch and more has BIG Data. In this session we will give a glimpse into some of the awesome things we are doing at Blizzard to make our games better through big data.

TBD by Sponsored Speaker 8
11:40 am - 12:10 pm
TBD by Sponsored Speaker 8
11:40 am - 12:10 pm
Sponsored

TBD

IOT: The Evolving World of Realtime BigData
12:20 pm - 12:50 pm
Jerry Power, Executive Director, CTM At USC
IOT: The Evolving World of Realtime BigData
12:20 pm - 12:50 pm
Jerry Power, Executive Director, CTM At USC
Jerry Power
Executive Director, CTM At USC
IoT
THH301
12:20 PM - 12:50 PM

Abstract:- IOT technology will allow big data structures to evolve from static off line repositories of digital knowledge to on-line representations of our current world. IOT will allow the techniques used with big data to identify trends and forecast future to become operationally enabled data structures that allow us to manage our digital environment for maximal advantage. The road to this reality has several hurdles that must first be overcome. Among these hurdles are trust, privacy, discovery, and behavioral economics. These issues will be discussed in the context of a large city operations network and potential options to overcome these hurdles will be offered.

Turning Relational Database Tables into Hadoop Datasources
2:00 pm - 2:30 pm
Kuassi Mensah, Director, Product Management At Oracle
Turning Relational Database Tables into Hadoop Datasources
2:00 pm - 2:30 pm
Kuassi Mensah, Director, Product Management At Oracle
Kuassi Mensah
Director, Product Management At Oracle
Hadoop/Spark/Kafka
THH301
2:00 PM - 2:30 PM

Abstract:- This session presents a Hadoop DataSource implementation for integrating and joining Big Data with Master Data in RDBMS.

Sponsored - Big Data Fabric for At-Scale Real-Time Analysis
2:40 pm - 3:10 pm
Ravi Shankar, Chief Marketing Officer At Denodo
Sponsored - Big Data Fabric for At-Scale Real-Time Analysis
2:40 pm - 3:10 pm
Ravi Shankar, Chief Marketing Officer At Denodo
Ravi Shankar
Chief Marketing Officer At Denodo
Big Data
THH301
2:40 PM - 3:10 PM

Abstract:- Companies are adopting big data for performing high-velocity real-time analytics on very large volumes of data to enable rapid analysis for business users using self-service and never-before-realized use cases. However, such projects have yielded limited value because these big data systems have become siloed from the rest of the enterprise systems holding critical business operational data. Big Data Fabric is a modern data architecture combining data virtualization, data prep, and lineage capabilities to seamlessly integrate at scale these huge, siloed volumes of structured and unstructured data with other enterprise data assets. This presentation will demonstrate with proven customer case studies in big data and IoT about the value of using big data fabric as a logical data lake for big data analytics.

Big Data Commercialization and associated IoT Platform Implications
3:20 pm - 3:50 pm
Ramniklal Mistry, Sr. Manager - IoT Data Analytics At Verizon
Big Data Commercialization and associated IoT Platform Implications
3:20 pm - 3:50 pm
Ramniklal Mistry, Sr. Manager - IoT Data Analytics At Verizon
Ramniklal Mistry
Sr. Manager - IoT Data Analytics At Verizon
IoT
THH301
3:20 PM - 3:50 PM

Abstract:- IoT Market overview and Verizon’s focus on specific IoT verticals (AgTech, Energy, Share, etc.), Criteria for evaluation of IoT data analytics opportunities, Platform considerations for big data solutions (security, network and platform connectivity, data analytics processing/storage, applications etc.), Examples of a few big data solutions at Verizon

Big Data in Pediatric Critical Care
4:30 pm - 5:00 pm
Mohit Mehra, Lead Data Engineer At Children's Hospital Los Angeles (CHLA)
Big Data in Pediatric Critical Care
4:30 pm - 5:00 pm
Mohit Mehra, Lead Data Engineer At Children's Hospital Los Angeles (CHLA)
Mohit Mehra
Lead Data Engineer At Children's Hospital Los Angeles (CHLA)
Big Data
THH301
4:30 PM - 5:00 PM

Abstract:- There is an urgent need in the pediatric ICUs to collect, store and transform healthcare data to make accurate and timely predictions in the areas of patient outcomes and treatment recommendations. We are currently heavily invested in using open source big data stacks in order to achieve this goal and help our young ones. In this talk I can highlight how we go about managing structured and unstructured high frequency data generated from a disparate set of devices and systems and ultimately how we have created data pipelines to process the data and make it available for data scientists and app developers.

Anomaly Detection in Time-Series Data using the Elastic Stack
5:10 pm - 5:40 pm
Henry Pak, Solutions Architect At Elastic
Anomaly Detection in Time-Series Data using the Elastic Stack
5:10 pm - 5:40 pm
Henry Pak, Solutions Architect At Elastic
Henry Pak
Solutions Architect At Elastic
NoSQL
THH301
5:10 PM - 5:40 PM

Abstract:- The Elastic has released a commercial machine learning plugin that allows you to create a model of your time series data using an unsupervised machine learning approach. Walk through a few common use cases to see how this plugin may help with finding anomalies in your data.

Data Science Out of The Box : Case Studies in the Telecommunications Industry
5:50 pm - 6:20 pm
Anand Ranganathan, VP Of Solutions At Unscrambl
Data Science Out of The Box : Case Studies in the Telecommunications Industry
5:50 pm - 6:20 pm
Anand Ranganathan, VP Of Solutions At Unscrambl
Anand Ranganathan
VP Of Solutions At Unscrambl
Hadoop/Spark/Kafka
THH301
5:50 PM - 6:20 PM

Abstract:- Telecommunications service providers (or telcos) have access to massive amounts of historical and streaming data about subscribers. However, it often takes them a long time to build, operationalize and gain value from various machine learning and analytic models. This is true even for relatively common use-cases like churn prediction, purchase propensity, next topup or purchase prediction, subscriber profiling, customer experience modeling, recommendation engines and fraud detection. In this talk, I shall describe our approach to tackling this problem, which involved having a pre-packaged set of analytic pipelines on a scalable Big Data architecture that work on several standard and well known telco data formats and sources, and that we were able to reuse across several different telcos. This allows the telcos to deploy the analytic pipelines on their data, out of the box, and go live in a matter of weeks, as opposed to the several months it used to take if they started from scratch. In the talk, I shall describe our experiences in deploying the pre-packaged analytic pipelines with several telcos in North America, South East Asia and the Middle East. The pipelines work on a variety of historical and streaming data, including call data records having voice, SMS and data usage information, purchase and recharge behavior, location information, browsing/clickstream data, billing and payment information, smartphone device logs, etc. The pipelines run on a combination of Spark and Unscrambl BRAINTM, which includes a real-time machine learning framework, a scalable profile store based on Redis and an aggregation engine that stores efficient summaries of time-series data. I shall describe some of the machine learning models that get trained and scored as part of these pipelines. I shall also remark on how reusable certain models are across different telcos, and how a similar set of features can be used for models like next topup or purchase prediction, churn prediction and purchase propensity across similar telcos in different geographies.

Automating Legal Fulfillment with SparkML
6:30 pm - 7:00 pm
Brendan Herger, Machine Learning Engineer At Capital One
Automating Legal Fulfillment with SparkML
6:30 pm - 7:00 pm
Brendan Herger, Machine Learning Engineer At Capital One
Brendan Herger
Machine Learning Engineer At Capital One
AI/ Machine Learning
THH301
6:30 PM - 7:00 PM

Abstract:- Capital One receives thousands of legal requests every year, often as physical mail. During this talk, we'll dive into how the Center for Machine Learning at Capital One have built a self contained platform for summarizing, filtering and triaging these legal documents, utilizing Apache projects.

ZHS159
ZHS159
Building Microservices with Apache Kafka
11:00 am - 11:30 am
Colin McCabe, Software Engineer At Confluent
Building Microservices with Apache Kafka
11:00 am - 11:30 am
Colin McCabe, Software Engineer At Confluent
Colin McCabe
Software Engineer At Confluent
Hadoop/Spark/Kafka
ZHS159
11:00 AM - 11:30 AM

Abstract:- Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.

TBD by Sponsored Speaker 1
11:40 am - 12:10 pm
TBD by Sponsored Speaker 1
11:40 am - 12:10 pm
Sponsored

TBD

Serverless Architectures with AWS Lambda and MongoDB Atlas
12:20 pm - 12:50 pm
Sig Narvaez, Senior Solutions Architect At MongoDB
Serverless Architectures with AWS Lambda and MongoDB Atlas
12:20 pm - 12:50 pm
Sig Narvaez, Senior Solutions Architect At MongoDB
Sig Narvaez
Senior Solutions Architect At MongoDB
NoSQL
ZHS159
12:20 PM - 12:50 PM

Abstract:- It's easier than ever to power serverless architectures with managed database services like MongoDB Atlas. In this session, we will explore the rise of serverless architectures and how they've rapidly integrated into public and private cloud offerings. We will demonstrate how to build a simple REST API using AWS Lambda functions, create a highly available cluster in MongoDB Atlas, and connect both via VCP Peering. We will then simulate load and use the monitoring and scale features of MongoDB Atlas and use MongoDB Compass to browse our database.

Enabling Scalable IOT Applications
2:00 pm - 2:30 pm
Adam Mollenkopf, Real-Time & Big Data GIS Capability Lead At ESRI
Enabling Scalable IOT Applications
2:00 pm - 2:30 pm
Adam Mollenkopf, Real-Time & Big Data GIS Capability Lead At ESRI
Adam Mollenkopf
Real-Time & Big Data GIS Capability Lead At ESRI
IoT
ZHS159
2:00 PM - 2:30 PM

Abstract:- This session will explore how DC/OS and Mesos are being used at Esri to establish a foundational operating environment to enable the consumption of high velocity IoT data using Apache Kafka, streaming analytics using Apache Spark, high-volume storage and querying of spatiotemporal data using Elasticsearch, and recurring batch analytics using Apache Spark & Metronome. Additionally, Esri will share their experience in making their application for DC/OS portable so that it can easily be deployed amongst public cloud providers (Microsoft Azure, Amazon EC2), private cloud providers and on-premise environments. Demonstrations of will be performed throughout the presentation to cement these concepts for the attendees. All demos will be available on a public github repo.

TBD by Sponsored Speaker 6 [TBD]
2:40 pm - 3:10 pm
TBD by Sponsored Speaker 6 [TBD]
2:40 pm - 3:10 pm
Sponsored

TBD

Purpose-built NoSQL Database for IoT
3:20 pm - 3:50 pm
Basavaraj Soppannavar, Technology & Marketing, GridDB At Toshiba America
Purpose-built NoSQL Database for IoT
3:20 pm - 3:50 pm
Basavaraj Soppannavar, Technology & Marketing, GridDB At Toshiba America
Basavaraj Soppannavar
Technology & Marketing, GridDB At Toshiba America
NoSQL
ZHS159
3:20 PM - 3:50 PM

Abstract:- The talk covers: 1. Properties of IoT data and database requirements for handling it; 2. Introduces the purpose-built NoSQL database for IoT called GridDB; 3. IoT data and time-series data modeling; 4. Real-world deployed IoT use cases

Data Augmentation and Disaggregation
4:30 pm - 5:00 pm
Neal Fultz, Principal Data Scientist, Optimization At OpenMail
Data Augmentation and Disaggregation
4:30 pm - 5:00 pm
Neal Fultz, Principal Data Scientist, Optimization At OpenMail
Neal Fultz
Principal Data Scientist, Optimization At OpenMail
Data Science
ZHS159
4:30 PM - 5:00 PM

Abstract:- Machine learning models may be very powerful, but many data sets are only released in aggregated form, precluding their use directly. Various heuristics can be used to bridge the gap, but they are typically domain-specific. The data augmentation algorithm, a classic tool from Bayesian computation, can be applied more generally. We will present a brief review of DA and how to apply it to disaggregation problems. We will also discuss a case study on disaggregating daily pricing data, along with a reference implementation R package.

Spark Pipelines in the Cloud with Alluxio
5:10 pm - 5:40 pm
Gene Pang, Software Engineer/ Founding Member At Alluxio
Spark Pipelines in the Cloud with Alluxio
5:10 pm - 5:40 pm
Gene Pang, Software Engineer/ Founding Member At Alluxio
Gene Pang
Software Engineer/ Founding Member At Alluxio
Big Data
ZHS159
5:10 PM - 5:40 PM

Abstract:- Organizations commonly use Big Data computation frameworks like Apache Hadoop MapReduce or Apache Spark to gain actionable insight from their large amounts of data. Often, these analytics are in the form of data processing pipelines, where there are a series of processing stages, and each stage performs a particular function, and the output of one stage is the input of the next stage. There are several examples of pipelines, such as log processing, IoT pipelines, and machine learning. The common attribute among different pipelines is the sharing of data between stages. It is also common for data pipelines to process data stored in the public cloud, such as Amazon S3, Microsoft Azure Blob Storage, or Google Cloud Storage. The global availability and cost effectiveness of these public cloud storage services make them the preferred storage for data. However, running pipeline jobs while sharing data via cloud storage can be expensive in terms of increased network traffic, and slower data sharing and job completion times. Using Alluxio, a memory speed virtual distributed storage system, enables sharing data between different stages or jobs at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this result in great performance gains. In this talk, we discuss how Alluxio can be deployed and used with a data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for pipelines in the cloud.

ClickHouse: A DBMS for interactive analytics at scale
5:50 pm - 6:20 pm
Vadim Tkachenko, CTO / Co-Founder At Percona
ClickHouse: A DBMS for interactive analytics at scale
5:50 pm - 6:20 pm
Vadim Tkachenko, CTO / Co-Founder At Percona
Vadim Tkachenko
CTO / Co-Founder At Percona
NoSQL
ZHS159
5:50 PM - 6:20 PM

Abstract:- ClickHouse is an OpenSource real-time analytical database that handles petabyte scale data sizes with massive linear scaling and SQL-like language.

How EP used Big data to Solve the Entertainment Industry's ACA Compliance Requirement
6:30 pm - 7:00 pm
Annette Novo, Director, Benefits At Entertainment Partners
How EP used Big data to Solve the Entertainment Industry's ACA Compliance Requirement
6:30 pm - 7:00 pm
Annette Novo, Director, Benefits At Entertainment Partners
Annette Novo
Director, Benefits At Entertainment Partners
Entertainment
ZHS159
6:30 PM - 7:00 PM

Abstract:- Imagine an industry that does not have an HR record for its employees. How do you comply with the Affordable Care Act's (ACA) health insurance eligibility determination when you don't know when someone started or stopped working for a particular company? That is the situation that the entertainment industry faced in 2013 as the ACA loomed on the horizon. Entertainment Partners, the largest provider of payroll and other related services to the entertainment industry for its production workforce set out to solve the problem. We coordinated across all of the industry's payroll providers, created a data analytics engine that ingests, aggregates and analyzes millions of transactions and determines which of their production workers meets the ACA eligibility criteria. We help the industry stay in compliance and avoid costly government penalties and we used Big Data to solve the problem.