Big Data Engineer
Mobile Premier League
We are Mobile Premier League (MPL), one of India’s largest esports and mobile gaming platforms. We have grown and scaled multifold over the last few years.
We hit 5 billion games played on MPL within a year of starting up and our current user base is over 90+ million. Some of our marquee investors include Sequoia Capital, Times Internet, and GoVentures.
We currently have over 60+ games on our platform. Multiple mainstreams, as well as Indie game developers, are our partners. Creating and re-inventing has been a key practice here and at this point, we require folks who can help us keep pace with the dynamic ecosystem we are all a part of.
About the role
We are seeking a skilled Spark Engineer with a minimum of 4 years of experience to join our dynamic team. As a Spark Engineer, you will play a crucial role in designing, implementing, and optimizing data processing solutions using Apache Spark. Your primary focus will be collaborating with data science teams to ensure efficient and scalable data workflows for advanced analytics and machine learning applications.
Responsibilities:
Collaboration with Data Science Teams:
- Work closely with data scientists to understand their requirements and develop Spark-based solutions to support their analytical and machine learning workflows
- Participate in cross-functional teams to integrate Spark components into end-to-end data processing pipelines
Spark Development and Optimization:
- Design, develop, and maintain Spark applications for large-scale data processing
- Optimize Spark jobs for performance, scalability, and reliability
- Implement best practices for Spark application development and deployment
Data Ingestion and Transformation:
- Create robust data ingestion pipelines to efficiently bring in data from various sources into Spark clusters
- Develop data transformation processes to clean, preprocess, and enrich data for downstream analytics and modeling
Cluster Management:
- Configure and manage Spark clusters to ensure optimal performance and resource utilization
- Troubleshoot and resolve issues related to Spark cluster operations
Monitoring and Logging:
- Implement monitoring solutions to track the health and performance of Spark applications and clusters
- Set up logging mechanisms to facilitate debugging and auditing of Spark jobs
Documentation and Knowledge Sharing:
- Document Spark applications, configurations, and deployment processes
- Share knowledge and best practices with team members to enhance overall technical capabilities
Requirements:
- Bachelor’s or higher degree in Computer Science, Engineering, or related field.
- Minimum of 4 years of hands-on experience with Apache Spark.
- Proficiency in programming languages such as Python for Spark development.
- Strong understanding of distributed computing principles and Spark architecture.
- Experience with data processing frameworks and technologies (e.g., Hadoop, Hive).
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes) is a plus.
- Excellent problem-solving and debugging skills.
- Strong communication and collaboration skills.