Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we assemble focused, cross-functional teams to systematically pursue and deliver paradigm-shifting science and technology.
Software Control Agents
The Software Control team is developing the foundations for computer use agent models that can robustly understand and operate across complex software environments. A central bottleneck in this space is representation: how multi-modal inputs such as visual interfaces, structured states, and action histories are encoded, fused, and reasoned over. This project focuses on advancing multi-modal representation learning for software control, with an emphasis on improving spatial grounding, planning reliability, and generalization across interfaces.
The Role
OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you will work on advancing the representation layer of software control agents, contributing to both theoretical understanding and practical system improvements.
In this project, you will be working on:
- Developing and analyzing multi-modal representations for software control agents across vision, language, and action modalities
- Designing and evaluating architectural modifications such as early fusion strategies, specialized encoders, and positional encoding variants
- Investigating structured reasoning representations and their interaction with learned embeddings for planning and execution
- Running targeted experiments and ablations to understand how representation choices impact performance, generalization, and robustness
- Contributing to research publications and open-source tools
Qualifications
Outstanding research emerges from individuals who can operate independently and navigate ambiguity. For this role, we are looking for:
- Strong foundation in machine learning theory, including probability, statistics, information theory, and optimization
- Deep understanding of multi-modal representation learning, particularly in vision-language models, sequence modeling, and related areas
- Experience working with deep learning frameworks such as PyTorch, JAX, or TensorFlow
- Familiarity with reinforcement learning, POMDPs, or video-language modeling techniques is preferred
- Ability to survey technical approaches and design, execute, and evaluate experiments, including ablations and systematic analyses
- Demonstrated prior research experience through publications or substantial technical projects
Expectations
There are a few key expectations and clarifications regarding the OS Research Team:
- Contribute approximately 10 hours per week to ensure consistent progress and engagement. Flexibility is understood, but proactive communication is required.
- Strong English proficiency is required to clearly communicate technical ideas.
- This is a volunteer effort. There is no monetary compensation, academic credit, or other formal incentive. Participation is driven by interest in high-impact research and collaboration.
More information on OS Research Team expectations is available here.
We look forward to seeing your application, and hopefully working together soon!