Sr. Data Engineer

Yahoo

Yahoo makes the world’s daily habits inspiring and entertaining. By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world. Yahoo’s vast businesses span across Search, Communications, Media, and many other verticals.

Yahoo generates a huge amount of data every day and it is critical to collect, manage and process data at petabyte scale to provide timely and accurate insights to executives, sales, product managers and product developers on all aspects of user interaction.

The Mail Analytics Engineering team at Yahoo is responsible for building mission critical data systems, pipelines, warehouses, analytics systems, and Machine Learning/AI/data mining programs for the Communications business. We are constantly pushing the envelope of data platforms due to the insane amount of data we need to harness.

As part of the Mail Analytics Engineering team, you will be working on data engineering pipelines and next generation Machine Learning- and AI-based data infrastructure, supporting new functionalities on existing platforms, and mining data for analytics insights and product features.

Our Big Data footprints are among the largest few in the world, at double-digit petabyte scale. Developing this infrastructure presents many technical challenges in the areas of efficient query processing, large-scale stream processing, machine learning and modeling, as well as satisfying complex business rules.

If you are someone who is passionate about harnessing data at insane scale, enjoys working with new technologies, setting up petabyte data infrastructures and implementing new machine learning solutions and metrics systems, we want to hear from you!

Responsibilities:

  • Improve our existing data infrastructures for machine learning and deep learning using your core expertise
  • Work with other engineers to implement algorithms and systems in an efficient way
  • Take end to end ownership of Machine Learning-based distributed data systems – from data pipelines and training, to real time prediction engines.
  • Develop complex queries, very large volume data pipelines, and analytics applications
  • Develop complex queries and software programs to solve analytics and data mining problems
  • Interact with data analysts, data scientists, product managers, and software engineers to understand business problems, technical requirements to deliver data solutions
  • Prototype new metrics or data systems
  • Lead data investigations to troubleshoot data issues that arise along the data pipelines
  • Maintenance and improvement of released systems
  • Engineering consulting on large and complex warehouse data

A lot About You:

  • BS/MS/PhD in Computer Science/Electrical Engineering, or related engineering disciplines, ideally with specialization in Data Engineering or Machine Learning
  • Strong fundamentals: algorithms, distributed computing, data structure, database
  • Fluency with: Python/Java/Scala/SQL
  • 5+ years of industry experience on very large scale analytics or ML systems development
  • 2+ years of experience with Google Cloud Platform (BiqQuery, Dataproc, Composer, Dataflow, BigTable, etc.)
  • 2+ years of experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Spark, Kafka, Oozie, etc.)
  • Experience in data modeling, schema design, ETL, and data analysis

Preferred:

  • Experience with machine learning algorithms, NLP, and/or statistical methods a big plus
  • Experience in any of: machine learning, analytics, data mining, or data mart and warehouse
  • Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib)

Set up job alerts and get notified about the new jobs

Similar Remote Jobs