Big Data Day LA 2016 was on July 9, 2016 @ West Los Angeles College

Stay tuned for next year’s conference!

Scenes from Big Data Day LA 2016 / Check out our presentations!

About the Conference / What You Need To Know

Big Data Day LA is the largest of its kind, and completely free, Big Data conference in Southern California. Spearheaded by Subash D’Souza and organized and supported by a community of volunteers, sponsors and speakers, Big Data Day LA features the most vibrant gathering of data and technology enthusiasts in Los Angeles.

The first Big Data Day LA conference was in 2013, with just over 250 attendees. We have since grown to over 550 attendees in 2014 and over 950 attendees in 2015. This year we expect to have over 1200 attendees!

Our 4th Annual Big Data Day LA 2016 conference was on Saturday, July 9th, 2016 at West Los Angeles College in Culver City.

Our featured tracks this year were:

– Big Data
– Data Science
– Hadoop / Spark / Kafka
– NoSQL and
– Use Case Driven

Helpful Links:
Big Data Day LA Program Brochure (PDF)
Sessions Table (Google Docs)

Attendees / See Who Will Be There

/ Data Scientists

/ Software Developers

/ System Architects

/ Head Researchers

/ Business Strategists

/ Data Engineers

/ Technical Leads

/ CEOs, CTOs, CIO, etc.

/ IT Managers

/ Consultants

/ Data Analysts

/ Researchers

/ Head Data Scientists

/ Entrepreneurs

Sponsors / 2016 Sponsors

2016 - West Los Angeles College
2016 - Indeed Prime2016 - Netflix
2016 - Claremont Graduate University2016 - Couchbase2016 - Datascience.com2016 - Domo2016 - Interana2016 - Paxata2016 - Pssc Labs2016 - RedisLabs
2016 - Cask2016 - Cognizant2016 - Data Applications Lab2016 - Datameer2016 - DoclerMedia2016 - MapR2016 - Second Spectrum2016 - Snowflake2016 - Tibco2016 - Toyota2016 - VideoAmp2016 - Warner Bros2016 - ZestFinance
2016 - Factual2016 - Iris.tv2016 - Microsoft2016 - OnPrem
2016 - Archangel Technology Consultants, LLC2016 - Courtyard by Marriott2016 - DataScience.LA2016 - Los Angeles Big Data Users Group2016 - Los Angeles County Registrar-Recorder/ County Clerk2016 - Magnus Unum2016 - TenOneTen Ventures

Keynote Speakers / 2016

All keynote talks will be held in Fine Arts 100 Auditorium

Andy_Yahoo

Andy Feng

VP - Architecture, Yahoo 9:30 AM  FA-100
Andy Feng is a VP Architecture at Yahoo, leading the architecture and design of big data and machine learning initiatives. He’s architected major platforms for personalization, ads serving, NoSQL, and cloud infrastructure.
gil_cropped

Gil Elbaz

Founder & CEO, Factual 9:50 AM  FA-100
Gil Elbaz is the Founder and CEO of Factual, a data company that enables developers, publishers and advertisers to build highly relevant mobile experiences using the context of location. Prior to Factual, Gil co-founded Applied Semantics Inc. (ASI), which developed contextual advertising products, including ASI's AdSense. Google acquired ASI in 2003, where Gil stayed on as the director of engineering, continuing to work on AdSense and other products. In 2008, Gil founded The Common Crawl Foundation, a non-profit with the goal of democratizing access to web information to enable greater innovation in research, business and education. He continues to serve on the board there as well as on the board of the X Prize Foundation. Gil is an active angel investor. He recently co-founded TenOneTen ventures with David Waxman, and his notable investments include Climate Corporation (acquired by Monsanto for $1.2 billion), GoodReads (acquired by Amazon), Scopely, and Kaggle. Gil earned a B.S. with a double major in Engineering & Applied Science and Economics from the California Institute of Technology.
jeanne_holmes

Jeanne Holm

Deputy CIO and Assistant General Manager, City of Los Angeles 9:20 AM  FA-100
As a leader in open data, education, community-building, and civic innovation, Jeanne Holm empowers people to discover new knowledge and collaborate to improve life on Earth and beyond. From astronauts in orbit to rural villagers in Uganda, billions of people use her systems to find the data, information, and knowledge they need to make better decisions every day. Jeanne is the Deputy CIO and Assistant General Manager for the City of Los Angeles, where her team empowers Angelenos and City officials with technology to connect, collaborate, and inform. She is also the CIO and Director for Education for World Peace One, a charity that promotes peace and social justice through education and music. As a senior consultant with the World Bank, she empowers governments and civil society to use open data to increase prosperity and civic good. As the former Evangelist for Data.Gov (for the White House), Jeanne led collaboration and built communities with the public, educators, developers, and governments in using open government data. She also served as the Chief Knowledge Architect at NASA’s Jet Propulsion Laboratory at the California Institute of Technology. She is a Distinguished Instructor at UCLA, teaching courses in knowledge management, big data, and civic innovation. She is a Fellow of the United Nations International Academy of Astronautics, co-Chair of the Africa Open Data community, and has more than 130 publications on innovation, open data, information systems, and knowledge management.
pe

Paul Ellwood

VP - Data Engineering & Analytics, Netflix 9:00 AM  FA-100
Paul Ellwood currently leads Netflix’s Data Engineering & Analytics group. Previously, he led Netflix’s Product Analytics group and prior to that, Marketing Analytics at Rosetta, an advertising agency. Born and raised in Ohio, Paul secured his Bachelors in Systems & Control Engineering from Case Western Reserve University and his Masters in Predictive Analytics from Northwestern University. He currently lives in San Francisco with his partner, Tj, and their beagle, Torque.
image

Rajiv Maheswaran

CEO & Co-Founder, Second Spectrum 10:00 AM  FA-100
Rajiv Maheswaran is CEO and Co-Founder of Second Spectrum, an innovative sports analytics and data visualization startup. Prior to Second Spectrum, Rajiv served as a Research Assistant Professor within the University of Southern California’s Department of Computer Science and a Project Leader at the Information Sciences Institute at the USC Viterbi School of Engineering. He and Second Spectrum COO/Co-Founder Yu-Han Chang co-directed the Computational Behavior Group at USC. Rajiv has received numerous awards and written over 100 publications in artificial intelligence and control theory. In 2014, Rajiv won the USC Viterbi School of Engineering Use-Inspired Research Award. Also in 2014, he won Best Research Paper (Alpha Award) at the renowned MIT Sloan Sports Analytics Conference. He also won the Best Research Paper (Alpha Award) at the MIT Sloan Sports Analytics Conference In 2012. Rajiv received a B.S. degree in Applied Mathematics, Engineering and Physics from the University of Wisconsin-Madison. He also received M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign. His work spans the fields of data analytics, data visualization, real-time interaction, spatiotemporal pattern recognition, artificial intelligence, decision theory and game theory.
Reynold Xin

Reynold Xin

Co-Founder, Databricks 9:40 AM  FA-100
Reynold Xin is an Apache Spark PMC member and Chief Architect for Spark and co-founder at Databricks.
unnamed

Tom Horan

Dean of the Drucker-Ito School of Management & Director of the Center for Information Systems and Technology at Claremont Graduate University (CGU).  8:50 AM  FA-100
Dr. Thomas A. Horan is Dean of the Drucker-Ito School of Management, and Director of the Center for Information Systems and Technology at Claremont Graduate University (CGU). Dr. Horan has also co-directed the university’s big data and creativity-innovation initiatives. Dr. Horan’s work has been at the intersection of technology and its application. He has approximately 25 years of experience in designing, testing, and assessing major innovations. These innovations have involved cloud-based, geo-spatial, and mobile applications in health, transportation, and education sectors. He has over 130 publications, including in major journals such as Management Information Systems Quarterly, Visual Information Systems, and Communications of the ACM. Dr. Horan’s work has been featured at the US White House and his research has been sponsored by numerous organizations, including the US Department of Transportation, California HealthCare Foundation, Blue Shield Foundation, US Social Security Administration, National Science Foundation, United Nations Development Corporation, Mayo Clinic, Kay Family Foundation, Salesforce.com and Getty Leadership Institute. Dr. Horan has been a Visiting Scholar at Harvard University, Massachusetts Institute of Technology, University of Hawaii, and University of Minnesota. He has also served in numerous advisory positions in the US, Middle East, and Asia. Dr. Horan has his masters and doctoral degrees from Claremont Graduate University.
WillThiede_Headshot

Will Thiede – Sponsored

Enterprise Account Executive, Indeed Prime 9:10 AM  FA-100
Will Thiede is a Senior Client Executive for Indeed Prime, an elite recruiting platform that matches in-demand data and software engineers with top-tier companies. Prime curates each exclusive weekly pool of unique candidates based on trending industry demand for key data skills, including Hadoop, Spark, Kafka, NoSQL, big data, and data science. Will’s background in enterprise tech account management includes previous roles at event software shop Splash and leading review site Trustpilot. He holds a bachelor’s in business management and entrepreneurship from Clemson.

Sessions / Track & Session Info

/ Big Data

10:30am (GC-150)

Real Time Analytics with Druid

by Guillaume Torche, Big Data Engineer, GumGum

GumGum uses Druid to ingest more than 30 billion events every day, which can be queried almost as soon as they happen with a very low response time. This is a tell-all talk about GumGum's love story with Druid, how Druid works and how GumGum leverages Druid's capabilities.

11:10am (Library 4th Fl.)

Portable Stream and Batch processing with Apache Beam and Google Cloud Dataflow

by Eric Anderson, Product Manager, Google

This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.

11:50am (GC-130)

Building scalable enterprise data flows and IoT apps using Apache NiFi

by Dhruv Kumar, Senior Solutions Architect, Hortonworks

Connecting enterprise systems has always been a tough task. Modern IoT applications have exacerbated the issue by the need to integrate legacy systems with novel high velocity data streams. Various patterns like messaging, REST, etc. have been proposed, but they necessitate rearchitecting the integration layer which is extremely arduous. In this talk we will show you how to use Apache NiFi to solve your data integration, movement and ingestion problems. Next, we will examine how Apache NiFi can be used to construct durable, scalable and responsive IoT apps in conjunction with other stream processing and messaging frameworks.

1:30pm (Library 4th Fl.)

Twitter Heron @ Scale

by Karthik Ramasamy, Engineering Manager, Twitter

Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. This talk looks at Twitter's operating experiences and challenges of running Heron at scale and the approaches taken to solve those challenges.

2:10pm (GC-160)

Warner Bros. Digital Consumer Intelligence at Scale

by Brian Kursar, VP - Data Strategy and Architecture, Warner Bros.

Warner Bros. processes billions of records each day Globally between its web assets, digital content distribution, OTT streaming services, online and mobile games, technical operations, anti-piracy programs, social media, and retail point of sale transactions. Combining these datasets with content metadata, Warner Bros. is able produce Consumer insights and affinity models that result in highly accurate Audience segments.

2:50pm (FA-100)

Rapid Analytics @ Netflix LA (Updated and Expanded)

by Chris Stephens, Senior Data Engineer, Netflix, Inc

This talk explores how Netflix equips its engineers with the freedom to find and introduce the right software for the job - even if it isn't used anywhere else in-house. Examples include how Netflix has enabled analysts to fluidly switch between MPP RDBMS and an auto-scaling Presto cluster, how Spark + NoSQL stores are used when deploying data sets to internal web apps, and how data scientists are enabled to work in the ML framework of their choosing and deploy models as a service.

3:50pm (GC-130)

Sponsored - Puree through Trillions of clicks in seconds

by Jag Srawan, Engineer, Interana

Interana is a full stack analytics solution that provides lightening fast querying capabilities using a proprietary storage format. Interana was designed to utilize best of both in-memory and disk architectures. This talk serves as an introduction to concepts on event data and utilizing advanced behavior analysis built into Interana. The attendee will gain knowledge about how to model their data effectively using our full service solution.

4:30pm (Library 4th Fl.)

How To Use Impala and Kudu To Optimize Performance for Analytic Workloads

by David Alves, Software Engineer, Cloudera

This session describes how Impala integrates with Kudu for analytic SQL queries on Hadoop and how this integration, taking full advantage of the distinct properties of Kudu, has significant performance benefits.

5:10pm (GC-150)

Apply R in Enterprise Applications

by Louis Bajuk-Yorgan, Sr. Director, Product Management, TIBCO Software Inc.

Prototypes are typically re-implemented in another language due to compatibility issues with R in the enterprise, but TIBCO Enterprise Runtime for R (TERR) allows the language to be run on several platforms. Enterprise-level scalability has been brought to the R language, enabling rapid iteration without the need to recode, re-implement and test. This presentation will delve further into these topics, highlighting specific use cases and the true value that can be gained from utilizing R. The session will be followed by a lively, open Q&A discussion.

5:50pm (GC-130)

Fluentd and Embulk: Collect More Data, Grow Faster

by Kazuki Ohta, Chief Technology Officer, Treasure Data

Since Doug Cutting invented Hadoop and Amazon Web Services released S3 ten years ago, we've seen quite a bit of innovation in large-scale data storage and processing. These innovations have enabled engineers to build data infrastructure at scale, many of them fail to fill their scalable systems with useful data, struggling to unify data silos or failing to collect logs from thousands of servers and millions of containers. Fluentd and Embulk are two projects that I've been involved to solve the unsexy yet critical problem of data collection and transport. In this talk, I will give an overview of Fluentd and Embulk and give a survey of how they are used at companies like Microsoft and Atlassian or in projects like Docker and Kubernetes.

/ Data Science

10:30am (GC-160)

Data storytelling for impact

by Dave Goodsmith, Data Scientist, DataScience Inc.

How can our data make the biggest impact? How do we find the stories worth sharing buried in our analytics? How important are visuals, hooks, connections, content? As data science and journalism have co-evolved, the potential for effectively communicating with data has skyrocketed. We'll look at case studies of impactful data stories and share the process for developing data stories that drive action.

11:10am (GC-130)

Decision Making and Lambda Architecture

by Girish Kathalagiri, Staff Engineer, Samsung SDS Research America

Online decision making over time needs interacting with an ever changing environment and underlying machine learning models need to change and adapt to this changing environment. This talk discusses class of machine learning algorithms and provides details of how the computation is parallelized using the Spark framework.

11:50am (FA-100)

Sponsored - Data Science + Hollywood

by Todd Holloway, Director Of Content Science & Algorithms, Netflix; Conor Dowling, Content Analytics Manager At Netflix

Netflix will spend six billion dollars this year on content, making the company a major player in Hollywood. An increasing portion of this spend will be on original shows such as House of Cards, and original movies such as Beasts of No Nation. As we continue to expand our involvement with Hollywood, we want to leverage data and data science to make the best decisions possible. This talk will explore areas where we see the most opportunity to apply data science to Hollywood, and some early approaches we've taken.

1:30pm (FA-100)

Enabling Cross-Screen Advertising with Machine Learning and Spark

by Debajyoti (Deb) Ray, Chief Data Officer, VideoAmp

With content now viewed seamlessly across multiple screens, this shift in consumer behavior/consumption has come to a head with the way advertising is sold - separately in TV and online silos - creating an opportunity to make advertising more effective using data and machine learning. This talk explores technological developments at VideoAmp that bring together data from disparate mediums and creates cross-screen audience models using ML methods for cross-screen bid optimization, and graph based audience models for 150 Million users, across over a billion unique device IDs, as well as behavioral insights gleaned from observing such a large variety of data.

2:10pm (Library 4th Fl.)

The right tool for the job: Guidelines for algorithm selection in predictive modeling

by Derek Wilcox, Senior Data Scientist, ZestFinance

The goal of this talk to lay out a framework for what algorithms work best in which situations, and why. Drawing on results of hundreds of crowd-sourced predictive modeling contests, this talk shows examples of how structure informs a choice in algorithm. As an illustration of these concepts, ZestFinance's work with China's retail giant, JD.com is used to describe how the right algorithms were applied to the right datasets to turn shopping data into credit data -- creating credit scores from scratch.

2:50pm (GC-160)

Stream processing with R and Amazon Kinesis

by Gergely Daroczi, Director of Analytics, CARD.com

This talk presents an original R client library to interact with Amazon Kinesis via a simple daemon to start multiple R sessions on a machine or cluster of nodes to process data from theoretically any number of shards, and will also feature some demo micro-applications streaming dummy credit cards transactions, enriching this data and then triggering other data consumers for various business needs, such as scoring transactions, updating real-time dashboards and messaging customers. Besides the technical details, the motives behind choosing R and Kinesis will be also covered, including a quick overview on the related data infrastructure changes at CARD.

3:50pm (FA-100)

Sponsored - The Evolving Data Science Landscape

by Kyle Polich, Principal Consulting Engineer, Datascience

The impact of data science on business is undeniable, and the value it provides is growing without signs of slowing. To keep up with this rapidly evolving technology landscape, data scientists must adapt and specialize through continuous learning. This talk focuses on how they can do that in a way that maximizes the positive impact data science will have on their organization.

4:30pm (GC-130)

Affinity Marketing Leveraging Crowdsourced Psychographics

by Ravi Iyer, Chief Data Scientist, Ranker; Glenn Walker, COO, Ranker

The most important variables to use to discover your best future customers are increasingly psychological. Borrowing techniques from psychometrics, this talk shows how marketers can use disparate online data sources to measure the right psychographic variables in order to maximize both performance and scale.

5:10pm (FA-100)

Intuit's Payments Risk Platform

by Dusan Bosnjakovic, Data Scientist, Intuit; Boris Belyi, Manager - Risk Analytics, Intuit

This talk explores the path taken at Intuit, the maker of TurboTax, Mint and Quickbooks, to operationalize predictive analytics and highlights automations that have allowed Intuit to stay ahead of the fraud curve.

5:50pm (GC-150)

Backstage to a Data Driven Culture: Your Data Science and Analytics Stack

by Pauline Chow, Consultant / Lead Data Science Instructor @ General Assembly

When you're the first data professional at the organization there are technical, process, and qualitative considerations for analytics and data science to address (A/DS). This talk is an overview of strategy, infrastructure, and tools for creating your first A/DS stacks. At this stage, the range of problems that you are able to solve relate to organization, operational, data engineering, business intelligence, and communication. Creating the optimal A/DS stack can seamlessly pave the way to big data and integrating the newest technologies in the future. Please share your stories and experience with us as well. Outline of talk, where sections intend to be interactive and get feedback from the audience:
1. So you're the first Data Scientist
2. Setting Their Expectations
3. Lay of the Land - Data requirements and organizational survey
4. Setting Your Expectations
5. Infrastructure - Your Stack Options
6. Resources: Get Help, Get a Team
7. Discussion

/ Hadoop / Spark / Kafka

10:30am (FA-100)

Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird

by Chris Fregly, Research Scientist, PipelineIO

Live, Interactive Recommendations Demo
Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com). Types of Similarity - Euclidean vs. Non-Euclidean, Similarity, Jaccard Similarity, Cosine Similarity, LogLikelihood Similarity, Edit Distance. Text-based Similarities and Analytics - Word2Vec, LDA Topic Extraction, TextRank, Similarity-based Recommendations, User-to-User, Content-based, Item-to-Item (Amazon), Collaborative-based, User-to-Item (Netflix), Graph-based, Item-to-Item ""Pathways"" (Spotify). Aggregations, Approximations, and Similarities at Scale - Twitter Algebird, MinHash and Bucketing, Locality Sensitive Hashing (LSH), BloomFilters, CountMin Sketch, HyperLogLog

11:10am (FA-100)

Iterative Spark Development at Bloomberg

by Nimbus Goehausen, Senior Software Engineer, Bloomberg

This presentation will explore how Bloomberg uses Spark, with its formidable computational model for distributed, high-performance analytics, to take this process to the next level, and look into one of the innovative practices the team is currently developing to increase efficiency: the introduction of a logical signature for datasets.

11:50am (Library 4th Fl.)

Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage

by Gene Pang, Software Engineer/ Founding Member At Alluxio

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system. The Alluxio open source community is one of the fastest growing open source communities in big data history with more than 300 developers from over 100 organizations around the world. In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. Alluxio now supports a wide range of under storage systems, including Amazon S3, Google Cloud Storage, Gluster, Ceph, HDFS, NFS, and OpenStack Swift. This year, our goal is to make Alluxio accessible to an even wider set of users, through our focus on security, new language bindings, and further increased stability.

1:30pm (GC-150)

Data Provenance Support in Spark

by Matteo Interlandi, Postdoc, UCLA

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. To aid this effort, we built Titian, a library that enables data provenance tracking data through transformations in Apache Spark.

2:10pm (GC-130)

Introduction to Kafka

by Jesse Anderson, CEO, Smoking Hand

An introduction to what Kafka is, the concepts behind it and its API.

2:50pm (Library 4th Fl.)

Building an Event-oriented Data Platform

by Eric Sammer, CTO & Co-Founder, Rocana

While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes per day of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality. This session is especially recommended for data infrastructure engineers and architects planning, building, or maintaining similar systems.

4:00pm - 5:00pm (GC-150)

Panel - Interactive Applications on Spark?

by Raj Babu, CEO at AgileISS - Moderator; Raymond Fu, Practice Architect at Trace3 - Panelist; David Levinger, Sr. Director Information Technology at Paxata - Panelist

In this interactive panel discussion, you will hear from these Spark experts as to why they chose to go "all-in" on Spark, leveraging the rich core capabilities that make Spark so exciting, and committing to significant IP that turns Spark into a world-class enterprise data preparation engine.
Raymond and David will explain specific cases where capabilities were built on top of core Spark to provide a true interactive data prep application experience. Innovations such as creating a Domain Specific Language (DSL), an optimizing compiler, a persistent columnar caching layer, application specific Resilient Distributed Datasets (RDDs), on-line aggregation operators to solve the core memory, pipelining and shuffling obstacles to produce a highly interactive application with the core user and data volume scale-out benefits of Spark.

5:10pm (GC-160)

Why is my Hadoop cluster slow?

by Bikas Saha, Software Engineer, Hortonworks

This talk draws on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.

5:50pm (FA-100)

Deep Learning at Scale

by Alexander Kern, Co-founder / CTO, Pavlov

The advent of modern deep learning techniques has given organizations new tools to understand, query, and structure their data. However, maintaining complex pipelines, versioning models, and tracking accuracy regressions over time remain ongoing struggles of even the most advanced data engineering teams. This talk presents a simple architecture for deploying machine learning at scale and offer suggestions for how companies can get their feet wet with open source technologies they already deploy.

/ NoSQL

10:30am (GC-130)

Real Life IoT Architecture

by Dinesh Srirangpatna, Big Data Strategist, Microsoft

Learn how to benefit from IoT (internet of things) to reduce costs and spur transformation for your company and clients. Attendees will learn about building blocks to create an IoT solution, and walk through real life architectural decisions in building a solution.

11:10am (GC-150)

Amazon DynamoDB - Focus on Your Data and Leave Ops to Someone Else

by Michael Limcaco, Principal Solutions Architect, Amazon Web Services (AWS)

This talk explores features and benefits of Amazon DynamoDB, a fully managed NoSQL database service, in detail, and discusses how to get the most out of DynamoDB, in addition to design best practices with DynamoDB across multiple use cases.

11:50am (GC-160)

Sponsored - Spark And Couchbase: Augmenting The Operational Database With Spark

by Matt Ingenthron, Senior Director Engineering At Couchbase

For an operational database, Spark is like Batman’s utility belt: it handles a variety of important tasks from data cleanup and migration to analytics and machine learning that make the operational database much more powerful than it would be on its own. In this talk, we describe the Couchbase Spark Connector that lets you easily integrate Spark with Couchbase Server, an open source distributed NoSQL document database that provides low latency data management for large scale, interactive online applications. We’ll start with common use cases for Spark and Couchbase, then cover the basics of creating, persisting and consume RDDs and DataFrames from Couchbase’s key/value and SQL interfaces.

1:30pm (GC-130)

Using Redis Data Structures to Make Your App Blazing Fast

by Adi Foulger, Solution's Architect, Redis Labs

Open Source Redis is not only the fastest NoSQL database but also the most popular among the new wave of databases running in containers. This talk introduces the data structures used to speed up applications and solve the everyday use cases that are driving Redis' popularity.

2:10pm (GC-150)

Apache Kudu: Fast Analytics on Fast Data

by Dan Burkert, Software Engineer, Cloudera

Apache Kudu (incubating) is a new storage engine for the Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. This talk provides an introduction to Kudu, and provides an overview of how, when, and why practitioners use Kudu as a platform for building analytics solutions.

2:50pm (GC-150)

Big Data and Real Estate

by Jon Zifcak MBA, MSIM, CEO, Zulloo Inc.; Anton Polishko, CTO, Zulloo Inc.;

The real estate industry is generating terabytes of data, but a very small percentage is being utilized or processed. ZULLOO Inc. is creating a artificial intelligence engine utilizing big data and machine learning. The question is, why aren't more data scientists exploring the real estate industry when it represents 15% of the US GDP, measured in the Trillions?

3:50pm (Library 4th Fl.)

Sponsored - Analytics at the Speed of Light with Redis and Spark

by Dave Neilsen, Developer Relations at RedisLabs

Spark is in-memory, Redis is in-memery. The Spark-Redis connector gives Spark access to Redis' data structures as RDDs. Redis, with its blazing fast performance and optimized in-memory data structures, reduces Spark processing time by up to 98%. In this talk, Dave will share the top use cases for Spark-Redis such as time-series, recommendations and real-time bid management.

4:30pm (FA-100)

Introduction to Graph Databases

by Oren Golan, VP of Engineering, Sanguine

Many organizations have adopted graph databases - IoT, health care, financial services, telecommunications and governments. This talk, based on our research and implementation of a graph database at Sanguine, a startup based in LA, dives into a few use cases and equips attendees with everything they need to start using a graph database.

5:10pm (GC-130)

MongoDB 3.2 Goodness!!!

by Mark Helmstetter, Principal Consulting Engineer, MongoDB

This talk explores the new features of MongoDB 3.2 such as $lookup, document validation rules, encryption-at-rest and tools like the BI Connector, OpsManager 2.0 and Compass.

5:50pm (Library 4th Fl.)

Privacy vs. Security in a Big Data World

by Tamara Dull, Director of Emerging Technologies, SAS Institute

The jury is still out on whether Edward Snowden was a hero, traitor, or schmuck. Regardless of the scarlet letter we want to hang around his neck, we should thank him for helping bring the discussion of big data privacy and security to the public square. This session examines the issues of big data privacy and security in the context of the six-stage (big) data lifecycle: create, store, use, share, archive, and destroy.

/ Use Case Driven

10:30am (Library 4th Fl.)

Reliable Media Reporting in an Ever-changing Data Landscape

by Josh Andrews, Data & Analytics Architect, OnPrem; Rachel Kelley, Project Manager , OnPrem and Eric Avila, Senior Anti-Piracy Technologist, NBC Universal

OnPrem Solution Partners worked with NBCU to profile in-house data to determine data quality, and recommend process and quality improvements. We present our process for data import, improvements we want to make, and lessons learned regarding various tools used, including MariaDB, ElasticSearch, Cassandra, and others.

11:10am (GC-160)

The Encyclopedia of World Problems

by Christine Zhang, Data Journalist, Knight-Mozilla @ LA Times

Born more than four decades ago from the partnership of two international NGOs in Brussels, the Encyclopedia of World Problems has hand-picked and refined profiles of tens of thousands of problems occurring around the world: from notorious global issues all the way down to very specific and peculiar ones. This talk presents an overview of the Encyclopedia and the interesting data science applications that have arisen from the Encyclopedia's body of work - notably, its database resources.

11:50am (GC-150)

Sponsored - BI is broken

by Dave Fryer, Product Advocate at Domo

Not all BI solutions are created equal. The problem in most organizations is that disparate systems hold data hostage. Most systems create barriers between the data and the people who need the data to make decisions. We create silos of data that do not give us a holistic view of how the organization is operating. Domo is breaking down these silos and giving business users unparalleled access to the data they need to optimize their business.

1:30pm (GC-160)

Dealing with Data Discomfort: Getting Bureaucrats to Embrace Data and Analytics

by Juan Vasquez, Communications & Data Analyst, Mayor's Operations Innovation Team at City of Los Angeles

Government is traditionally known for red tape, stuffy hierarchies, endless policies, and clashing priorities. These and other variables make it difficult for government entities to embrace change and innovation, and more importantly apprehensive about peeling back the layers and letting data tell the stories.

So how do you change that? In this talk we'll discuss how the Mayor's Operations Innovation Team is leveraging storytelling, education, public-private partnerships, and data visualization technologies to help LA embrace data.

2:10pm (FA-100)

Data and Hollywood: "Je t'Aime ... Moi Non Plus"

by Yves Bergquist, Project Director, "Data & Analytics", USC - Entertainment Technology Center

Application of machine learning to problems such as script and story analysis, audience segmentation, and security, is revolutionizing the way Hollywood is creating and marketing entertainment.

2:50pm (GC-130)

Hydrator: Open Source, Code-Free Data Pipelines

by Jon Gray, CEO, Cask Data

This talk will present how to build data pipelines with no code using the open-source, Apache 2.0, Cask Hydrator. The talk will continue with a live demonstration of creating data pipelines for two use cases.

3:50pm (GC-160)

Sponsored - From Clusters to Clouds, Hardware Still Matters

by Eric Lesser, Director of Operations at PSSC Labs

Today’s Software Defined environments attempt to remove the weakness of computing hardware from the operational equation. There is no doubt that this is a natural progress away from overpriced, proprietary compute and storage layers. However, even at the heart of any Software Defined universe is an underlying hardware stack that must be robust, reliable and cost effective. Our 20+ years experience delivering over 2000 clusters and clouds has taught us how to properly design and engineer the right hardware solution for Big Data, Cluster and Cloud environments. This presentation will share this knowledge allowing user to make better design decisions for any deployment.

4:30pm (GC-160)

How to Use Design Thinking to Jumpstart Your Big Data Projects

by Peter Reale, Solutions Engineer, Datameer

There is a novel approach to identifying big data use cases, one which will ultimately lower the barrier to entry to big data projects and increase overall implementation success. This talk describes the approach used by big data pioneer and Datameer CEO Stefan Groschupf to drive over 200 production implementations.

5:10pm (Library 4th Fl.)

Shaping the Role of Data Science: An Evolution towards Prescriptive Analytics as Key Driver in Revenue Acceleration

by Thomas Sullivan, Chief Data Scientist, IRIS.TV

At IRIS.TV, our business builds algorithmic solutions for video recommendation with the end goal to deliver a great user experience as evidenced by users viewing more video content. This talk outlines our reasons for expanding from a descriptive/predictive approach to data analytics toward a philosophy that features more prescriptive analytics, driven by our data science team.

5:50pm (GC-160)

Raising Venture Capital for Data Driven Startups

by Austin Clements, Associate, TenOneTen Ventures

Get an inside look into how VCs evaluate your team, market, and product before making an investment decision. Learn how to identify the right investors for your business and how to stand out from the crowd.

Session Speakers / 2016

af

Adi Foulger

Solutions Architect at Redis Labs

ak2

Alexander Kern

Co-Founder/ CTO at Pavlov

ap

Anton Polishko

CTO at Zulloo

ac

Austin Clements

Associate at TenOneTen Ventures

Bikas_Saha_200x200

Bikas Saha

Software Engineer At Hortonworks

bb

Boris Belyi

Manager, Risk Analytics, Intuit, Inc

Brian_Kursar_200x200

Brian Kursar

VP - Data at Warner Bros

cf

Chris Fregly

Research Scientist at PipelineIO

cs

Chris Stephens

Engineer at Netflix, Inc

cz

Christine Zhang

Research Fellow at Los Angeles Times

cd

Conor Dowling

Content Analytics Manager at Netflix

db

Dan Burkert

Software Engineer at Cloudera

df

Dave Fryer

Product Advocate at Domo

dg

Dave Goodsmith

Senior Data Scientist at DataScience, Inc.

Dave_Nielsen_200x200

Dave Neilsen

Developer Relations at RedisLabs

dra

David Alves

Software Engineer at Cloudera

dl2

David Levinger

Sr. Director Information Technology at Paxata

dr

Debajyoti (Deb) Ray

Chief Data Officer at VideoAmp

dw

Derek Wilcox

Data Scientist at ZestFinance

dk

Dhruv Kumar

Senior Solutions Architect at Hortonworks

dsri

Dinesh Srirangpatna

Data Solutions Architect at Microsoft

db2

Dusan Bosnjakovic

Data Scientist at Intuit

ea

Eric Anderson

Product Manager at Google

ea2

Eric Avila

Senior Anti Piracy Technologist at NBC Universal

el

Eric Lesser

Director of Operations at PSSC Labs

es

Eric Sammer

CTO and co-founder at Rocana

gp

Gene Pang

Software Engineer/ Founding Member at Alluxio

gd

Gergely Daroczi

Director of Analytics at CARD.com

gk

Girish Kathalagiri

Staff Engineer at Samsung SDS Research America

IMG_0492

Glenn Walker

COO at Ranker

gt2

Guillaume Torche

Big Data Engineer at GumGum

js

Jag Srawan

Engineer at Interana

ja

Jesse Anderson

CEO at Smoking Hand

jf

Jon Zifcak, MBA, MSIM

CEO at Zulloo Inc.

Jonathan_Gray_200x200

Jonathan Gray

CEO At Cask

ja2

Josh Andrews

Data & Analytics Architect at OnPrem Solution Partners

jv

Juan Vasquez

Data Analyst, Mayor's Operations Innovation Team at City of Los Angeles

kr

Karthik Ramasamy

Engineering Manager at Twitter

ko

Kazuki Ohta

Chief Technology Officer at Treasure Data, Inc.

Kyle_Polich_200x200

Kyle Polich

Principal Data Scientist At DataScience, Inc

ly

Louis Bajuk-Yorgan

Senior Director pf Product Management at TIBCO

mh

Mark Helmstetter

Principal Consulting Engineer at MongoDB

mi2

Matt Ingenthron

Senior Director Engineering at Couchbase

mi

Matteo Interlandi

PostDoc at UCLA

michael

Michael Limcaco

Principal Solutions Architect At AWS

ng

Nimbus Goehausen

Senior Software Engineer at Bloomberg

og

Oren Golan

VP Software Engineering at Sanguine

pc

Pauline Chow

Consultant / Lead Data Science Instructor @ General Assembly

pr

Peter Reale

Solutions Engineer at Datameer

rk

Rachel Kelley

Manager at OnPrem Solution Partners

raj_babu_200x200

Raj Babu

CEO At Agile ISS

ri

Ravi Iyer

Chief Data Scientist at Ranker

rf

Raymond Fu

Practice Architect at Trace3

td

Tamara Dull

Director of Emerging Technologies at SAS

ts

Thomas Sullivan

Chief Data Scientist at IRIS.TV

th

Todd Holloway

Director of Content Science & Algorithms at Neflix

yb

Yves Bergquist

Director, Data & Analytics Program at USC/ETC

Volunteers / 2016 Organizer

subashhs

Subash D’Souza

Organizer; Sponsors & Sessions Chair

Volunteers / 2016 Committee Leaders

abe_200x200

Abraham Elmahrek

Sessions Vice Chair

Version 2

Arti Annaswamy

Technology Vice Chair

ac

Austin Clements

Location Liasion

FullSizeRender (2)

Clarissa Pinto Ribeiro

Sponsors Vice Chair

Eduardo_Arino_200x200

Eduardo Ariño de la Rubia

Sessions - Data Science Co-Chair

eric

Eric Lui

Registration Chair

Jimmy_Kim_200x200

Jimmy Kim

Food/Beverages Vice Chair

afe24387-1c32-4ffd-8e0a-ead69378684c

John Kim

Location Chair

Kaloyan_Todorov_200x200

Kaloyan Todorov

Registration Vice Chair

Michael_Chiang_200x200

Michael Chiang

Location Vice Chair

oszie

Oszie Tarula

Food/Beverages Chair

Priyanka Biswas

Priyanka Biswas

Marketing Vice Chair

Rich_Ung_200x200

Rich Ung

Technology Chair

subashhs

Subash D’Souza

Organizer; Sponsors & Sessions Chair

Szilard_Pafka_200x200

Szilard Pafka

Sessions - Data Science Co-Chair

Weixiang_Chen_200x200

Weixiang Chen

Marketing Chair

Volunteers / 2016 Volunteers

abe_200x200

Abraham Elmahrek

First Employee at FOSSA, Inc.

ah

Alex Hon

Student at USC

ay

Amit Yakhmi

Business Development

ac

Annie Chen

Cyber Security Sales Intern at Novacoast

aj

Anthony James

Consultant/Adjunct Faculty at Private/CSULA

unnamed

Armen Donigian

Lead Data Engineer at ZestFinance

ab

Arnold Borres

Sr Systems Analyst at UST Global

Version 2

Arti Annaswamy

Data Analytics Consultant

bp

Ben Perlmutter

Sales Engineer at Xplenty

bb

Brandon Brooks

Technical Recruiter at Teradata

bc

Brian Cottrell

Mobile Software Engineer at DIRECTV

cl

Cecilia Luo

MS Business Analytics candidate

cp

Charalampos Papadimitriou

Data Scientist at DataScience, Inc.

cz

Chen Zou

Business Analyst at Machinima

FullSizeRender (2)

Clarissa Pinto Ribeiro

Connector | Coach | Tech MatchMaker

cm

Cris Manlangit

Student at San Diego State University-California State University

cristina

Cristina Salajan

Project Manager at Farmers Insurance

Daniel_Gutierrez_200x200

Dan Gutierrez

Data Scientist at AMULET Analytics

dn

Darryl Nousome

Doctoral Student at USC

Dave_Nielsen_200x200

Dave Nielsen

Cloud Computing Evangelist & Consultant

ds

Dharmesh Soni

Summer Intern

dsri

Dinesh Srirangpatna

Data Solution Architect at Microsoft

nc

Dominique (Nikki) Castle

Associate Insights Analyst at DataScience, Inc

eric

Eric Lui

Director of Data Operations at Factual

es

Eric Schimansky

Group Manager at Avanade

65b92f42-1ce5-439c-8faa-5274b33725fd

Frank Solomon

Chief Blogger: bitvectors.blogspot.com

gs

Greg Sweeney

Regional Sales Director at Datameer

gz

Guan Zhou

Student at UCLA

4cebf275-10fa-4c19-b2a2-004d64cf7eec

Jae Lim

Analytic Programmer/ Investor/ Thinker

jb

Jason Brancazio

Data Engineer at TripAdvisor

jg

Jason Geng

Co-Founder at Data Applications Lab

jw

Jenny Wang

Data Wrangler at Koss REsource

Jimmy_Kim_200x200

Jimmy Kim

Data Center, Cloud, SasS Business Develpment Sales

jw

Jin Wu

Emerging Technologies Librarian at USC

jp

Jing Pan

Statistical Data Analyst

Joe_Devon_200x200

Joe Devon

Entrepreneur & Advisor in the Tech Space

afe24387-1c32-4ffd-8e0a-ead69378684c

John Kim

VP, Ops | BSA, SFDC Admin

jz

John Zhang

PhD Candidate at UCLA

jd

Jovany Diaz

Accounting Systems Analyst I at Los Angeles County Auditor-Controller

Kaloyan_Todorov_200x200

Kaloyan Todorov

AVP, Credit Risk at Bank of America

kt

Kelly Thomassen

Coordinator at AdoptTogether.org

kyle_walker_200x200

Kyle Walker

Lead Engineer at ZEFR

md

Manish Dwibedy

Graduate student at USC

Michael_Chiang_200x200

Michael Chiang

Executive Director at Crescent Solutions

npc

Nandan Chandrashekhar

Graduate Student at USC

na

Nicholson Alforque

Autozoner at AutoZone

nr

Nithin Reddy

Senior Software Engineer at Enplug, Inc.

omar

Omar Atayee

Sales Engineer at Mashape

Oswald_Jones_200x200

Oswald Jones

Data Engineer

oszie

Oszie Tarula

Programmer / Analyst III (Lead Web UI/UX Developer) at UCLA

Priyanka Biswas

Priyanka Biswas

PhD candidate at USC

pk

Priyanka Kale

Software Engineer

raj_babu_200x200

Raj Babu

Data Lake on Cloud Evangelist

rk

Ravin Kumar

Supply Chain Systems Engineer at SpaceX

Rich_Ung_200x200

Rich Ung

Business Intelligence Engineer at Disney ABC Television Group

rl

Ryan Lampert

Founder at LiftedPresence

ss

Sanjeev Sehgal

SVP Sales Satwic

ss2

Santhosh Sunkara

Graduate student at Oklahoma State University

sd

Shashidhar Desai

Software Engineer in Test at Zefr

sl

Simon Lee

Strategic Relations Executive at Captive Eight

Siva_Bhavanari_200x200

Siva Bhavanari

Senior Data Engineer

Sooraj_Akkammadam_200x200

Sooraj Akkammaddam

Sr. ETL Developer at Core Digital Media

sk

Spandan Karamchedu

Software Developer in Test at Velocify

ss

Stella Shao

MS Business Analytics Candidate, USC Marshall School of Business

subashhs

Subash D’Souza

Organizer

vl

Vivian Li

Data Scientist at Enervee

Weixiang_Chen_200x200

Weixiang Chen

Founder at NewMet Data

wm

William Mo

Partner Business Development at NovaStor

xw

Xi Wang

Data Scientist at Fama

993c6ff3-7ee6-4fd8-892c-deb589dd4ff0

Yunyun Dai

Senior Researcher at USC

zk

Zia Khan

VP Business Development at Allsite IT