Spark Analysers: Catching Anti-Patterns In Spark Apps

Spark Analysers: Catching Anti-Patterns In Spark Apps
-
Introduction: Spark applications at Uber sometimes tend to be unoptimized, leading to inefficient usage of computing resources. "Spark Analysers" is a set of components developed by Uber to detect anti-patterns in Spark applications.
-
Components: Spark Analysers consists of two main components - Spark Event Listener and Analysers. Spark Event Listener listens for specific events emitted by the Spark framework during an application run and pushes the collected information to a Kafka topic. Analysers is a real-time Apache Flink application that polls the Kafka topic for events to detect anti-patterns in the Spark application.
-
Functioning: The Spark Plan, which is a tree data structure containing information about the operations performed by the Spark application, is the main entity parsed by the Spark Event Listener. The listener uses the semantic hash of Spark Plan to identify duplicate plans and maintain a state per application. The detected anti-pattern events are pushed to Kafka and processed by Analysers, which filters out approved use cases to avoid duplicate ticket creation. The automated ticketing pipeline sends reminders to application owners to take action on the Jira tickets.
-
Future Scope: Uber aims to investigate and roll out new analysers to increase the coverage of anti-patterns in Spark applications.