Mjpfaxozfqmo2zsunbek
SkillsCast

Workshop: Big Data Machine Learning with Python and Spark on AWS

5th July 2018 in London at CodeNode

There are 23 other SkillsCasts available from Infiniteconf 2018 - The conference on Big Data and AI

This session was not filmed.

Robert Hardy will host an interactive workshop on how to use Spark with Python to carry out machine learning algorithms when you have too much data to comfortably work with Pandas. Attendees should bring along a laptop and should open an account on AWS. All code and setup scripts will be available on a public GitHub repo.

You will cover all steps of the workflow:

  • Spinning up your Spark instance on AWS
  • Trimming and cleaning data
  • Using different storage formats for faster handling
  • Browsing subsets of the data to get a feel for which features might be the most useful
  • Application of models from the SparkMLliband Scikit-learn libraries
  • Viewing results and assessing the quality of our predictions

YOU MAY ALSO LIKE:

Thanks to our sponsors

Workshop: Big Data Machine Learning with Python and Spark on AWS

Robert Hardy

Robert Hardy is a full stack quant, with over 12 years of experience in the front office teams of major financial institutions. He has built professional portfolio management systems entirely from open source components. He experienced an epiphany when he was introduced to TDD, pair programming and Agile methods. Robert talks and blogs on topics related to software and mathematics, and with his diploma in painting and ceramics in hand he claims to even have some level of expertise in the Fine Arts.

SkillsCast

This session was not filmed.

Robert Hardy will host an interactive workshop on how to use Spark with Python to carry out machine learning algorithms when you have too much data to comfortably work with Pandas. Attendees should bring along a laptop and should open an account on AWS. All code and setup scripts will be available on a public GitHub repo.

You will cover all steps of the workflow:

  • Spinning up your Spark instance on AWS
  • Trimming and cleaning data
  • Using different storage formats for faster handling
  • Browsing subsets of the data to get a feel for which features might be the most useful
  • Application of models from the SparkMLliband Scikit-learn libraries
  • Viewing results and assessing the quality of our predictions

YOU MAY ALSO LIKE:

Thanks to our sponsors

About the Speaker

Workshop: Big Data Machine Learning with Python and Spark on AWS

Robert Hardy

Robert Hardy is a full stack quant, with over 12 years of experience in the front office teams of major financial institutions. He has built professional portfolio management systems entirely from open source components. He experienced an epiphany when he was introduced to TDD, pair programming and Agile methods. Robert talks and blogs on topics related to software and mathematics, and with his diploma in painting and ceramics in hand he claims to even have some level of expertise in the Fine Arts.

Photos