Cdkeykpwoxy1gxn6b0a7
SkillsCast

Validating Big Data Jobs - Stopping Failures before Production (w/ Spark, BEAM, & friends!)

13th December 2018 in London at Business Design Centre

There are 50 other SkillsCasts available from Scala eXchange London 2018

Please log in to watch this conference skillscast.

746191236 640x360

As big data jobs move from the proof-of-concept phase into powering real production services, you will need to consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data).

During this talk, you will discover that you will eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production). It's important to automatically recognise when things have gone wrong, so you can stop deployment before you have to update our resumes.

Figuring out when things have gone terribly wrong is trickier than it first appears, since you want to catch the errors before your users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist you in writing relative validation rules based on historical data. For folks working in streaming, you will learn about the unique challenges of attempting to validate in a real-time system, and what you can do besides keeping an up-to-date resume on file for when things go wrong.

You will discover code examples in Apache Spark, as well as learn about similar concepts in Apache BEAM (a cross platform tool), but the techniques should be applicable across systems.

Real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.

YOU MAY ALSO LIKE:

Thanks to our sponsors

Validating Big Data Jobs - Stopping Failures before Production (w/ Spark, BEAM, & friends!)

Holden Karau

Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.

SkillsCast

Please log in to watch this conference skillscast.

746191236 640x360

As big data jobs move from the proof-of-concept phase into powering real production services, you will need to consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data).

During this talk, you will discover that you will eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production). It's important to automatically recognise when things have gone wrong, so you can stop deployment before you have to update our resumes.

Figuring out when things have gone terribly wrong is trickier than it first appears, since you want to catch the errors before your users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist you in writing relative validation rules based on historical data. For folks working in streaming, you will learn about the unique challenges of attempting to validate in a real-time system, and what you can do besides keeping an up-to-date resume on file for when things go wrong.

You will discover code examples in Apache Spark, as well as learn about similar concepts in Apache BEAM (a cross platform tool), but the techniques should be applicable across systems.

Real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.

YOU MAY ALSO LIKE:

Thanks to our sponsors

About the Speaker

Validating Big Data Jobs - Stopping Failures before Production (w/ Spark, BEAM, & friends!)

Holden Karau

Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.

Photos