OS Research Fellow - Benchmarking Multimodal Action Models

Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we coordinate focused, cross-functional teams and a large asynchronous research contributor pool to systematically pursue and deliver paradigm-shifting science and technology.

MultiNet: A Generalist Benchmark for Multimodal Action models

Multinet is a comprehensive benchmarking initiative for evaluating generalist models that can process a wide variety of modalities and autonomously take action. Your work will help establish the foundation for truly multimodal AI systems that can perceive, understand, and act in complex environments. More information on this project is available here.

The Role

OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you'll actively drive our ambitious projects—from initial roadmapping and technical implementation to writing impactful papers.

In this project, you'll be working on:

Designing and implementing comprehensive benchmarks for evaluating multimodal AI systems across diverse tasks and modalities
Profiling and optimizing large LLMs, VLMs, and Vision Language Action Models to understand their multimodal capabilities and limitations
Collecting, curating, and cleaning diverse multimodal datasets that challenge current AI systems and reveal capability gaps
A foundation in probability theory, linear algebra, and optimization methods, with hands-on experience in model fine-tuning techniques including LoRA/QLoRA and other parameter-efficient methods.
Creating large-scale action datasets by training expert RL agents in simulated environments and collaborating with partner organizations to collect novel robotics data from real-world deployments
Pioneering new types of software action datasets that capture complex GUI interactions, API sequences, and multi-step digital tasks to expand the scope of actionable AI
Developing evaluation frameworks that measure cross-modal understanding, reasoning, and action generation
Contributing to the MultiNet Evaluation Benchmark and Framework with improvements for scaling to larger, more capable models

Qualifications

Outstanding research emerges from driven, talented minds. For this project, we are looking for the following attributes:

Demonstrated prior research experience, evidenced by published work in peer-reviewed conferences, journals, or recognized preprint platforms
Hands-on experience with profiling and running experiments with large language models (LLMs), including performance analysis, ablation studies, and systematic evaluation
Strong skills in data collection, curation, and cleaning, particularly for multimodal datasets combining text, images, and action sequences
Interest and preliminary understanding of Vision Language Action Models (VLAs) and their architectures, training procedures, and evaluation methodologies
Proficiency with Python and familiarity with Git, Linux (using the VMs on cloud)
Proficiency with deep learning frameworks (PyTorch, JAX, or TensorFlow), with experience in distributed computing environments

Expectations

There are a few key expectations and clarifications we need to emphasize regarding the OS Research Team:

Contribute approximately 10 hours per week to ensure meaningful progress and deep engagement with our projects. Flexibility around life commitments is understood; clear, proactive communication helps us support each other.
Experience with being able to navigate the uncertainty of research w/ a high degree of autonomy.
Our working language is English, and a strong proficiency is required to clearly communicate technical concepts without confusion or misunderstanding.
This is a volunteer effort; none of us receive compensation of any kind—including monetary payment, academic credit, or other formal incentives. Our commitment is driven entirely by shared passion for impactful research.
More information on OS Research Team expectations is available here.

We look forward to seeing your application!

OS Research Fellow - Benchmarking Multimodal Action Models

MultiNet: A Generalist Benchmark for Multimodal Action models

The Role

Qualifications

Expectations

Manifold Team