OS Research Fellow - Evaluation & Infrastructure, MultiNet

Remote - Part-Time
In:

Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we coordinate focused, cross-functional teams and a large asynchronous research contributor pool to systematically pursue and deliver paradigm-shifting science and technology.

MultiNet v2.0: Cross-Domain Multimodal Benchmarking

MultiNet v2.0 is a next-generation benchmark designed to test whether multimodal models truly understand tasks across fundamentally different action domains, or simply overfit to specific interfaces.

We construct puzzle environments where agents must perceive, reason about causal structure, plan, and act. Each task is presented across multiple action domains, including discrete grid environments, 3D physics simulations, and natural language interfaces. The task semantics remain identical, while the interface changes.

This allows us to directly measure whether models generalize across action spaces. Success across domains indicates real understanding; failure reveals interface-specific overfitting.

The Role

OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you will contribute to building the evaluation and infrastructure layer for MultiNet v2.0.

In this role, you will be responsible for:

  • Setting up and managing large-scale experiments across models and action domains
  • Building pipelines for running inference, logging results, and storing outputs
  • Designing and maintaining evaluation protocols and metrics for cross-domain performance
  • Ensuring reproducibility, consistency, and scalability of experiments
  • Supporting integration across GridWorld, 3D, and natural language environments

Qualifications

Outstanding research emerges from individuals who can execute reliably at scale while maintaining strong systems thinking. For this role, we are looking for:

  • Strong software engineering skills, particularly in Python and Linux environments
  • Experience working with cloud infrastructure (e.g., AWS, GCP, Azure)
  • Familiarity with running and managing large-scale machine learning experiments
  • Experience with logging, experiment tracking, and data management systems
  • Understanding of best practices for evaluating LLMs, VLMs, or RL-based agents

Expectations

There are a few key expectations and clarifications regarding the OS Research Team:

  • Contribute approximately 10 hours per week to ensure meaningful progress and deep engagement with our projects. Flexibility around life commitments is understood; clear, proactive communication helps us support each other.
  • Our working language is English, and strong proficiency is required to clearly communicate technical concepts without confusion or misunderstanding.
  • This is a volunteer effort; none of us receive compensation of any kind—including monetary payment, academic credit, or other formal incentives. Our commitment is driven entirely by shared passion for impactful research.

More information on OS Research Team expectations is available here.

We look forward to seeing your application, and hopefully working together soon!

Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Manifold Research.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.