A leading talent connection firm in San Francisco is looking for an AI Model Evaluation Specialist. This role involves evaluating AI-generated responses for accuracy and practical usefulness, along with creating specialized prompts. Candidates should hold a Master’s degree or higher in Health or a r
Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language c
Think Different. Build the Future. Our Mission Build everyday AGI. Trustworthy, consumer-grade agents that redefine human-AI collaboration for millions. Software shouldn't wait for commands; it should partner with you, amplifying what you can do every single day. Why AGI, Inc. We're a stealth team o
A leading AI research company is looking for an AI Model Evaluation Specialist in Australia. The role requires a Master's degree and involves evaluating AI-generated content for accuracy, writing effective prompts, and providing justifications for evaluations. The position is contract-based, offerin
A leading AI research firm is looking for an AI Model Evaluator to ensure the accuracy and relevance of AI-generated responses. The role necessitates a Master’s degree and significant experience in finance or advisory roles. You will score model outputs, write justifications for evaluations, and ass
A leading AI consultancy firm is seeking part-time consulting physicists to enhance AI models' understanding of physics. This role involves evaluating model responses for accuracy and rigor while working with leading AI teams. Candidates should possess a PhD in a related field, expertise in various
Overview Fullstack Engineer, GenAI Model Evaluation at Tesla. The GenAI Model Evaluation team is the main line of defense in ensuring customer safety. We are looking for an experienced fullstack developer to own tooling used to determine the best model for release. These tools have high visibility w
Software Engineer (Model Evaluation & Benchmarking) San Francisco, California (Hybrid) $140,000 - $170,000 + Equity + Healthcare + 401(k) + PTO Are you a Software Engineer interested in working on the systems that measure and validate cutting-edge AI models before they reach production, while jo
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive
The Role We seek experienced engineers and scientists to develop the evaluation metrics and systems that drive frontier LLM performance. You'll design the frameworks that tell us whether our models are improving and ensure they perform reliably at scale in production. Key Responsibilities Design, de
A leading AI talent connecting company in San Francisco is seeking an AI Model Evaluation Contractor. This role involves writing prompts, evaluating AI-generated responses, and providing detailed feedback on accuracy and reasoning. Ideal candidates will have domain expertise and strong communication
A leading AI research organization in San Francisco is seeking an AI Model Evaluation Specialist on a contract basis. Applicants should hold a Master’s degree in Computer Science or a related field and possess strong written communication skills. The role involves evaluating AI-generated content for
A leading AI company in San Francisco is seeking an ML Engineer to develop and maintain automated evaluation pipelines for machine learning models. The successful candidate will focus on ensuring reliability and safety of AI systems deployed in government environments. Candidates should have strong
A leading AI consultancy firm is seeking part-time consulting physicists to enhance AI models' understanding of physics. This role involves evaluating model responses for accuracy and rigor while working with leading AI teams. Candidates should possess a PhD in a related field, expertise in various
Software Engineer (Model Evaluation & Benchmarking) San Francisco, California (Hybrid) $140,000 - $170,000 + Equity + Healthcare + 401(k) + PTO Are you a Software Engineer interested in working on the systems that measure and validate cutting-edge AI models before they reach production, while joinin
A leading AI research company located in San Francisco seeks a Research Lead to develop evaluation methodologies for AI models. You will drive hands-on research, lead a small team, and influence how the industry measures model capabilities. This role requires significant experience in AI evaluations
Location San Francisco Office Employment Type Full time Location Type On-site Department Engineering Think Different. Build the Future. 🚀 Our Mission Build everyday AGI. Trustworthy, consumer-grade agents that redefine human–AI collaboration for millions. Software shouldn’t wait for commands; it sh
About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive
A leading autonomous technology company is seeking a Software Engineer/Data Scientist for Large Model Evaluation. This role involves developing metrics and conducting data analysis to improve ML models, specifically for driving performance in autonomous systems. Ideal candidates should have a backgr
A leading AI research company is looking for an AI Model Evaluation Specialist in Australia. The role requires a Master's degree and involves evaluating AI-generated content for accuracy, writing effective prompts, and providing justifications for evaluations. The position is contract-based, offerin
About Teamily AI Teamily AI is building the world's first Human-AI Social Network - an AI-native messaging platform where AI agents participate as first-class members in group conversations alongside humans. Agents on Teamily are autonomous, self-evolving, proactive, and personalized. They discover
A leading electric vehicle manufacturer is seeking a talented AI Evaluation Engineer to design metrics and evaluate model performance. The role entails creating visualizations and collaborating with AI researchers to drive projects forward. Candidates should have a Bachelor's in Computer Science, st
What to Expect The GenAI Model Evaluation team is the main line of defense in ensuring customer safety. We are looking for an experienced fullstack developer to take ownership of the tooling we use to determine the best model for release. These tools have high visibility within the organization and
Overview The GenAI Model Evaluation team is the main line of defense in ensuring customer safety. We are looking for an experienced fullstack developer to take ownership of the tooling we use to determine the best model for release. These tools have high visibility within the organization and help d
A leading electric vehicle manufacturer in California seeks an experienced fullstack developer. You will create tools for model evaluation, collaborate with AI researchers, and drive projects from start to finish. The ideal candidate has a Bachelor's degree in Computer Science and strong skills in J
What to Expect The AI Evaluation team is the main line of defense in ensuring customer safety. Working alongside our AI team, you will design metrics that utilizes fleet data and run on large inference clusters to help drive key decisions about end‑to‑end model architecture, data integrity, and expo
The Role We seek experienced engineers and scientists to develop the evaluation metrics and systems that drive frontier LLM performance. You'll design the frameworks that tell us whether our models are improving and ensure they perform reliably at scale in production. Key Responsibilities Design, de
### **About xAI** xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive
Frequently asked questions about Model Capability Evaluation in San Francisco
How much does a Model Capability Evaluation earn in San Francisco?
The estimated salary for Model Capability Evaluation in San Francisco ranges from $40,000 to $63,000 USD per year, depending on experience and company.
How many Model Capability Evaluation jobs are available in San Francisco?
There are currently 64 job offers for Model Capability Evaluation in San Francisco listed on BeBee.
How can I find a Model Capability Evaluation job in San Francisco?
Sign up for free on BeBee, search for Model Capability Evaluation jobs in San Francisco, and apply directly with one click.
Which companies are hiring Model Capability Evaluation in San Francisco?
Multiple companies in San Francisco are looking for Model Capability Evaluation. Browse the listings on BeBee to see companies that are actively hiring.