Senior ML Platform Engineer
Sunnyvale, United States
42dotFull-time

About Us:

42dot is a mobility AI company committed to solving mobility challenges with software and AI. As the Global Software Center of Hyundai Motor Group, 42dot pioneers the future of mobility by advancing the development of software-defined vehicles.

We develop safety-first, user-centric software-defined vehicle technologies that deliver the latest performance through continuous updates like smartphones. By advancing software and AI technology, 42dot envisions a world where everything is connected and moves autonomously through a self-managing urban transportation operating system.

About the Role:

As a Senior Data Platform Engineer, you will play a pivotal role in building the core infrastructure that powers the future of autonomous driving at 42dot. This role is at the intersection of data engineering, machine learning, and autonomous systems—requiring both deep technical expertise and a system-level mindset. You will be responsible for setting the technical strategy and leading the development of our high-performance data platform, designed to process, manage, and serve massive-scale multimodal datasets for ML model training and validation. From building a robust lakehouse architecture for AD scene data to optimizing complex data processing pipelines, you'll work across disciplines to ensure the seamless flow of data that drives our autonomy stack forward. If you’re passionate about solving large-scale data challenges in a fast-paced, high-impact environment, this is your opportunity to shape how self-driving vehicles learn and evolve.

Responsibilities

  • Set technical strategy and oversee development of high scale, reliable data platform to manage, visualize and serve large-scale datasets for ML model training and validation.

  • Build up the data lakehouse for autonomous driving scene datasets, including the sensor data, calibration data, as well as annotation data

  • Drive the Autonomous Driving Data SDK development, including scene data search, datasets preparation, dataset loading, etc.

  • Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.

  • Bootstrap and maintain infrastructure for Data Platform components—Data Processing Pipeline, Database, Data Lakehouse and Data Serving.

  • Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall Autonomous Driving System Architecture.

Qualifications

  • Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a similar technical field.

  • Minimum of 7 years of experience in Data Engineering or ML Platform roles

  • Expert-level proficiency in Python and solid experience in Python SDK development

  • Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)

  • Strong understanding of modern AI frameworks (e.g., PyTorch, TensorFlow etc.), especially the principle of distributed data loader for model training

  • Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models

  • Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)

  • Experience with Apache Spark or other big data computing engines

  • Excellent leadership and communication skills, with a demonstrated ability to lead technical projects