Founding Engineer – Full Stack ML DevTools & Systems

Founding Engineer – Full Stack ML DevTools & Systems Location: San Francisco, CA Type: Full-Time Base Compensation: $150,000 – $250,000 Equity: Competitive Series A Equity Package Overview This is a founding-level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post-training, evaluation, and reinforcement learning workflows. The platform enables ML engineers and researchers to: Create structured training data Run reinforcement fine-tuning workflows Evaluate model performance reliably and reproducibly at scale This is a high-ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI-native startups. This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems. Core Responsibilities Platform & Backend Systems Design and implement backend systems supporting post-training workflows, dataset primitives, run tracking, and artifact management Build reliable execution and orchestration systems with strong isolation and reproducibility Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines Contribute to containerized infrastructure and Kubernetes-based deployment patterns Python SDK & Developer Experience Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility Design developer-friendly abstractions for reinforcement learning, evaluation loops, and training workflows Develop evaluation-native workflows connecting capability measurement, data creation, training, and re-evaluation loops Improve CLI tools, developer interfaces, and local-to-cloud workflows Infrastructure & Cloud Systems Work across compute, networking, storage, and IAM configurations Design systems that are scalable, reproducible, and secure Collaborate on distributed systems design and execution infrastructure Customer & Research Collaboration Partner directly with ML engineers and researchers to translate real-world workflows into platform improvements Incorporate structured customer feedback into roadmap decisions Operate at the intersection of research needs and production reliability Requirements Strong production experience in Python Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration Deep understanding of Docker and Linux environments Cloud fundamentals: compute, networking, storage, IAM Strong product instincts with a bias toward shipping Demonstrated end-to-end ownership of production systems Required Candidate Q&A LinkedIn Profile GitHub URL Publications URL (Google Scholar or similar, if applicable) Interview Process Initial Screen Technical Evaluation Work Trial Final Discussion Offer Decision

Place of work

Talent Job Seeker
California
app.general.countries.United States

About us

Identifica el mejor Talento con Talent Job Seeker



Job ID: 10431860 / Ref: 45876a8e134b54515256b31a696e5452

Talent Job Seeker