Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we assemble focused, cross-functional teams to systematically pursue and deliver paradigm-shifting science and technology.

Software Control Agents

The Software Control team is developing the foundations for computer use agent models that can robustly understand and operate across complex software environments. A central bottleneck in this space is representation: how multi-modal inputs such as visual interfaces, structured states, and action histories are encoded, fused, and reasoned over. This project focuses on advancing multi-modal representation learning for software control, with an emphasis on improving spatial grounding, planning reliability, and generalization across interfaces.

The Role

OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you will work on advancing the representation layer of software control agents, contributing to both theoretical understanding and practical system improvements.

In this project, you will be working on:

Developing and analyzing multi-modal representations for software control agents across vision, language, and action modalities
Designing and evaluating architectural modifications such as early fusion strategies, specialized encoders, and positional encoding variants
Investigating structured reasoning representations and their interaction with learned embeddings for planning and execution
Running targeted experiments and ablations to understand how representation choices impact performance, generalization, and robustness
Contributing to research publications and open-source tools

Qualifications

Outstanding research emerges from individuals who can operate independently and navigate ambiguity. For this role, we are looking for:

Strong foundation in machine learning theory, including probability, statistics, information theory, and optimization
Deep understanding of multi-modal representation learning, particularly in vision-language models, sequence modeling, and related areas
Experience working with deep learning frameworks such as PyTorch, JAX, or TensorFlow
Familiarity with reinforcement learning, POMDPs, or video-language modeling techniques is preferred
Ability to survey technical approaches and design, execute, and evaluate experiments, including ablations and systematic analyses
Demonstrated prior research experience through publications or substantial technical projects

Expectations

There are a few key expectations and clarifications regarding the OS Research Team:

Contribute approximately 10 hours per week to ensure consistent progress and engagement. Flexibility is understood, but proactive communication is required.
Strong English proficiency is required to clearly communicate technical ideas.
This is a volunteer effort. There is no monetary compensation, academic credit, or other formal incentive. Participation is driven by interest in high-impact research and collaboration.

More information on OS Research Team expectations is available here.

We look forward to seeing your application, and hopefully working together soon!

OS Research Fellow - Multi-Modal Representation Modeling, Software Control Agents

Software Control Agents

The Role

Qualifications

Expectations

Manifold Team