Light up the Spark in Catalyst by avoiding UDFs


Processing data at scale usually results in struggling with performance, strict SLA, limited hardware etc. I've struggled with cutting Spark SQL query run-time and found the culprit! This culprit, and SOLUTION! I would like to share with you. Today in the world of Big Data and Spark we are processing high volume transactions. Catalyst is the Spark SQL query optimizer and in this talk, you will learn how to fully utilize Catalyst optimization power in order to make our queries as fast as possible, by pushing down actions and trying to avoid UDFs as much as possible and maximizing performance.

Language: English

Level: Advanced

Adi Polak

Senior Cloud Developer Advocate - Microsoft

I love to code, investigate problems, find solutions, read, talk and play with/about tech I believe in sharing my knowledge and learn from others, therefore my friends and I founded flip-il conference to support the functional programming community in Israel. Please follow me on medium:

Go to speaker's detail