Remote | PhD Rater — Up to $100/hr

Talent Job Seeker
California
Full-time
share Forward
WhatsApp LinkedIn Facebook X Xing email E-Mail content_copy Copy link

We are sharing a specialised remote opportunity for experienced researchers and technical experts to support a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows. This project focuses on designing and validating complex benchmark tasks across domains such as data science, machine learning, finance, and coding. The role involves building real-world evaluation tasks, implementing them in Python-based environments, and analysing how advanced AI systems perform in solving complex technical problems. Key Responsibilities Design challenging real-world STEM problems for model evaluation Implement benchmark tasks inside agentic development environments using Python Create reproducible tasks with executable tests and clearly defined specifications Analyse model and agent outputs to identify reasoning gaps and failure modes Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks Document benchmark tasks, environments, and evaluation outcomes Ideal Profile Strong candidates may have: Active or recently completed PhD from a top-tier U.S.-based university Deep expertise in data science, machine learning, finance, and/or Python-based programming Strong research background in advanced STEM domains Experience designing complex technical problems or research benchmarks Ability to analyse model reasoning traces and diagnose deeper system behaviour issues Strong analytical and research documentation skills Educational Background: PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields Nice to Have Experience working with agentic frameworks or LLM tooling ecosystems Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems Contributions to open-source software or research projects Experience analysing complex model behaviour or agent workflows Why This Opportunity Contribute directly to frontier AI model evaluation and benchmarking efforts Work on advanced research challenges in agentic AI systems and STEM reasoning tasks Collaborate with leading AI labs and technical researchers Help identify and improve limitations in next-generation AI systems Contract Details Independent contractor role Fully remote with flexible scheduling Part-time research engagement with expected availability of 30+ hours per week Competitive rates between $50–$100/hour depending on expertise Weekly payments via Stripe or Wise Projects may extend or adjust depending on scope and performance No access to confidential or proprietary information from employers or institutions About the Platform This opportunity is available through a leading AI-driven work platform.

Place of work

Talent Job Seeker
California
app.general.countries.United States

About us

Identifica el mejor Talento con Talent Job Seeker

Job ID: 10473680 / Ref: ad8677608209e0321c731ed88f29b049

Open application open_in_new

Remote | PhD Rater — Up to $100/hr

Place of work

About us

Talent Job Seeker

Other job advertisements from Talent Job Seeker