
Yahoo
Big Data Tools Developer
We build, improve, and maintain one of the highest scaling platforms in the world. Our amazing team of Engineers work on next generation Big Data Platforms that transform how users connect with each other every single day. Yahoo’s Big Data Platform drives some of the most demanding applications in the industry. The system handles billions of requests a day and runs on some of the largest Hadoop clusters ever built! 50,000 nodes strong and several multi-thousand node clusters bring scalable computing to a whole new level. We work on problems that cover a wide spectrum – from web services to operating systems and networking layers. Our biggest challenges ahead are designing efficient cloud native big data platforms.
Responsibilities:
- Job Monitoring: Overseeing the execution of various data jobs, ensuring they adhere to SLAs and do not encounter issues.
- Data Orchestration: Utilizing tools like Airflow to manage the scheduling, execution, and monitoring of data workflows across cloud platforms such as AWS and GCP.
- Query Execution and Optimization: Designing and optimizing queries to run efficiently on platforms such as BigQuery, Hive, Pig, and Spark, ensuring high performance and scalability.
- Integration and Support: Collaborating with different teams to integrate data flows, provide support for query executions, and handle credentials for secure data operations.
- Feature Development: Implementing new features to support advanced query capabilities, including federated queries and lineage tracking.
Required Skills and Qualifications:
- Educational Background: A Bachelor’s or Master’s degree in Computer Science or equivalent work experience.
- Programming Languages: Proficiency in Python is essential for scripting and workflow management; experience with Java and C++ is preferred for backend data operations.
- Data Management: Knowledge of data structures, algorithms, and database management systems like SQL, HBase, and BigQuery.
- Cloud Technologies: Experience with cloud services, especially AWS (EMR, Glue, S3) and GCP (Dataproc, BigQuery).
- Agile Methodology: Comfortable working in an Agile environment with regular sprints, planning, and retrospectives.
- System Design: Ability to design large-scale, distributed systems that are highly available and resilient.
- OS: Some experience working with Linux/Unix operating systems
Preferred Qualifications:
- Experience with development and deployment on public cloud platforms such as AWS, GCP, Azure, or others
- Experiencing developing containerized applications and working with container orchestration services
- Experience with Apache Hadoop, Presto, Hive, Oozie, Pig, Storm, Spark, Jupyter
- Understanding of data structures & algorithms
- Knowledge of JVM internals and its performance tuning
- Excellent debugging/testing skills, and excellent analytical and problem solving skills
- Experience with continuous integration tools such as Jenkins and Hudson
- Strong verbal and written communication skills to collaborate effectively with cross-functional teams.
Want to learn more? Visit the Yahoo company profile to browse the latest job listings.