Snorkel AI
Research Scientist
Job Summary
This role involves researching and developing data generation, curation, and evaluation techniques to support AI models and Data-as-a-Service offerings. The candidate will design pipelines, build evaluators, and contribute to cross-functional projects using Python and machine learning frameworks. Strong collaboration and communication skills are essential, along with experience in AI model development and data workflows. The position emphasizes innovation in data-centric AI and offers comprehensive employee benefits.
Required Skills
Benefits
Job Description
We’re on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to achieving differentiation, high performance, and production-ready systems. We work with some of the world’s largest organizations to empower scientists, engineers, financial experts, product creators, journalists, and more to build custom AI with their data faster than ever before. Excited to help us redefine how AI is built? Apply to be the newest Snorkeler!
The Expert Data-as-a-Service (DaaS) team delivers high-quality, large-scale datasets that power frontier AI systems. As a researcher working on DaaS, you will focus on developing novel approaches for data generation, curation and evaluation. You will be responsible for designing innovative techniques that combine automated methods with human expertise to achieve best-in-class efficiency and quality. You will collaborate closely with engineering and operations teams, as well as our customers’ research teams, to define the future of data generation workflows that will power frontier AI models.
Main Responsibilities
- Conduct research on data curation and generation to support emerging use cases across domains
- Collaborate with customer research teams to translate their high-level goals into data requirements, and annotation guidelines and workflows
- Design and prototype data generation and curation pipelines that feed directly into Data as a Service offerings
- Build sophisticated evaluators to measure quality in our data, including coverage, bias, and utility
- Write clear, maintainable Python code to support experiments and production pipelines; contribute to internal tooling and shared libraries
- Iterate rapidly on solutions based on customer feedback, emerging research, and evolving DaaS requirements
- Collaborate cross-functionally with delivery managers, vendors, and engineering teams to research to production
Preferred Qualifications
- PhD. in Computer Science or a related field with focus on data centric AI and synthetic data generation
- Strong foundation in large language models, generative AI, or data generation techniques, especially for supervised fine-tuning and reinforcement learning
- Experience developing, experimenting with, and deploying AI models and data pipelines at scale
- Solid programming skills in Python; familiarity with ML frameworks such as PyTorch, HuggingFace, etc. And familiarity with software engineering best practices and clean coding.
- Track record of working in fast paced, iterative environments and handling uncertainty in project requirements
- Bias for action, comfortable rolling up your sleeves, experimenting, and iterating quickly to solve problems
- Strong communication and collaboration skills, especially when working across research, engineering, and delivery teams
Nice to Have
- Past experience in data labeling, annotation, or curation projects
- Publications or contributions related to data curation for LLM fine tuning
- Knowledge of production workflows for DaaS offerings or data delivery teams
- Familiarity with quality control processes for high volume data pipelines
Why Join Us?
- Be part of a growing Data as a Service business that powers frontier AI models for top enterprises
- Work at the intersection of research and production, bringing novel data generation and curation techniques into real world pipelines
- Collaborate with a founder stage DaaS team, contributing to development of processes, tooling, and quality standards
- Competitive compensation range of $140,000 – $275,000 plus equity opportunities
- Growth oriented environment where your work directly impacts product direction and customer success
Learn more about our recent launch in Forbes: Snorkel AI Raises $100 Million to Build Better Evaluators for AI Models`
Explore our Expert Data as a Service offering: snorkel.ai/expert-data-as-a-service
This role is ideal for candidates who love both research and building real AI systems in a dynamic, high impact setting. A PhD in machine learning or related field with a strong publication record is preferred, but we also welcome applications from those with equivalent expertise gained through industry experience, research labs, or other career paths.
Snorkel AI
Unlock the power of programmatic AI data development to build production AI applications with Snorkel Flow—100x faster!
See more jobsSafe Remote Job Search Tips
Verify Employer Thoroughly
Research the company's identity thoroughly before applying. Check for a professional website with contacts, active social media, and LinkedIn profiles. Verify details across platforms and look for reviews on Glassdoor or Trustpilot to confirm legitimacy.
Never Pay to Get a Job
Legitimate employers never require payment for applications, training, background checks, or equipment. Always reject upfront payment requests or demands for bank details, even if they claim it's for purchasing necessary work gear on your behalf.
Safeguard Your Personal Information
Protect sensitive data like SSN, bank details, or ID copies. Share this only after accepting a formal, written job offer. Ensure it's submitted via a secure company system or portal, never through insecure channels like standard email attachments.
Scrutinize Communication & Interviews
Watch for communication red flags: poor grammar, generic emails (@gmail), vague details, or undue pressure. Be highly suspicious of interviews held only via text or chat apps; legitimate companies typically use video or phone calls.
Beware of Unrealistic Offers
If an offer's salary or benefits seem unrealistically high for the work involved, be cautious. Research standard pay for similar roles. Offers that appear 'too good to be true' are often scams designed to lure you into providing information or payment.
Insist on a Formal Contract
Always secure and review a formal, written job offer or employment contract before starting work or sharing final personal details. Ensure it clearly defines your role, compensation, key terms, and conditions to avoid misunderstandings or scams.