Research Engineer / Scientist, Foundational Multimodal Models

About Fixie

We’re a Seattle-based AI startup (with support for working remotely). We’ve raised $17M in seed funding. Our vision is simple: build artificial intelligences that can communicate as naturally as humans. We’re a small team of researchers and engineers with a deep focus in speech and real-time technologies. Our core model, Ultravox, is open-source. We also build a serving stack that’s optimized for very low-latency interactions.

The Role

As a Research Engineer & Scientist, you will lead the effort to develop next-generation foundational multimodal models that power Ultravox, our open-source speech-to-speech model.

What you’ll do

Lead critical research on architectural design, pre-training, and post-training of foundational multimodal models to develop real-time voice AI.
Collaborate with a team of researchers and engineers to develop foundational multimodal models with comprehensive capabilities in speech understanding, speech generation, and full-duplex real-time communication.
Develop novel models based on public and proprietary data sources.
Build tools to improve our data flywheel and measure model quality.
Drive the optimization and deployment of AI models for real-world applications in partnership with engineering and product teams.

Things we’re looking for

An incredibly strong AI researcher with a track record of contributions to AI research, systems, and products.
Experience with large language models, speech models, and multimodal models.
Strong experience in Python and, ideally, PyTorch.
Ability to roll up your sleeves and get things done.
A great communicator and team player.

Benefits

Generous equity package
Unlimited PTO (take time when you need it)
Top-of-market salary