OS Research Fellow - Evaluation & Infrastructure, MultiNet

Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we coordinate focused, cross-functional teams and a large asynchronous research contributor pool to systematically pursue and deliver paradigm-shifting science and technology.

MultiNet v2.0: Cross-Domain Multimodal Benchmarking

MultiNet v2.0 is a next-generation benchmark designed to test whether multimodal models truly understand tasks across fundamentally different action domains, or simply overfit to specific interfaces.

We construct puzzle environments where agents must perceive, reason about causal structure, plan, and act. Each task is presented across multiple action domains, including discrete grid environments, 3D physics simulations, and natural language interfaces. The task semantics remain identical, while the interface changes.

This allows us to directly measure whether models generalize across action spaces. Success across domains indicates real understanding; failure reveals interface-specific overfitting.

The Role

OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you will contribute to building the evaluation and infrastructure layer for MultiNet v2.0.

In this role, you will be responsible for:

Setting up and managing large-scale experiments across models and action domains
Building pipelines for running inference, logging results, and storing outputs
Designing and maintaining evaluation protocols and metrics for cross-domain performance
Ensuring reproducibility, consistency, and scalability of experiments
Supporting integration across GridWorld, 3D, and natural language environments

Qualifications

Outstanding research emerges from individuals who can execute reliably at scale while maintaining strong systems thinking. For this role, we are looking for:

Strong software engineering skills, particularly in Python and Linux environments
Experience working with cloud infrastructure (e.g., AWS, GCP, Azure)
Familiarity with running and managing large-scale machine learning experiments
Experience with logging, experiment tracking, and data management systems
Understanding of best practices for evaluating LLMs, VLMs, or RL-based agents

Expectations

There are a few key expectations and clarifications regarding the OS Research Team:

Contribute approximately 10 hours per week to ensure meaningful progress and deep engagement with our projects. Flexibility around life commitments is understood; clear, proactive communication helps us support each other.
Our working language is English, and strong proficiency is required to clearly communicate technical concepts without confusion or misunderstanding.
This is a volunteer effort; none of us receive compensation of any kind—including monetary payment, academic credit, or other formal incentives. Our commitment is driven entirely by shared passion for impactful research.

More information on OS Research Team expectations is available here.

We look forward to seeing your application, and hopefully working together soon!

OS Research Fellow - Evaluation & Infrastructure, MultiNet

MultiNet v2.0: Cross-Domain Multimodal Benchmarking

The Role

Qualifications

Expectations

Manifold Team