This session was not filmed.
Robert Hardy will host an interactive workshop on how to use Spark with Python to carry out machine learning algorithms when you have too much data to comfortably work with Pandas. Attendees should bring along a laptop and should open an account on AWS. All code and setup scripts will be available on a public GitHub repo.
You will cover all steps of the workflow:
- Spinning up your Spark instance on AWS
- Trimming and cleaning data
- Using different storage formats for faster handling
- Browsing subsets of the data to get a feel for which features might be the most useful
- Application of models from the SparkMLliband Scikit-learn libraries
- Viewing results and assessing the quality of our predictions
YOU MAY ALSO LIKE:
Workshop: Big Data Machine Learning with Python and Spark on AWS
Robert Hardy is a full stack quant, with over 12 years of experience in the front office teams of major financial institutions. He has built professional portfolio management systems entirely from open source components. He experienced an epiphany when he was introduced to TDD, pair programming and Agile methods. Robert talks and blogs on topics related to software and mathematics, and with his diploma in painting and ceramics in hand he claims to even have some level of expertise in the Fine Arts.