Manifold Research Group tackles ambitious, high-impact research problems that traditional institutions overlook—those too engineering-intensive for academia and too exploratory for industry. Inspired by coordinated research models like ARPAs and FROs, we assemble focused, cross-functional teams to systematically pursue and deliver paradigm-shifting science and technology.
VLM Calibration Project
Develop novel calibration techniques for Vision-Language Models (VLMs) to understand and improve their confidence estimation and reliability across diverse tasks and domains. Your work will explore the fundamental differences between various confidence elicitation methods, investigate the impact of different fine-tuning paradigms on calibration, and potentially use SOTA mechanistic approaches like using SAEs to try to explain differences in calibration amongst models. Detailed role requirements are below:
The Role
OS Team members form the core of Manifold Research Group. As an OS Research Fellow, you'll actively drive our ambitious projects—from initial roadmapping and technical implementation to writing impactful papers.
In this project, you'll be working on:
- Investigating calibration properties of modern VLMs across different architectures and scales, including analysis of verbalized confidence versus logprob-based confidence estimation.
- Developing novel approaches to understand what causes overconfidence or sycophancy in VLMs - using mechanistic approaches.
- Conducting systematic comparisons between instruction-tuned and base model variants to understand how RLHF affects calibration properties.
- Contributing to high-impact research publications that advance the field of reliable and interpretable AI systems.
Qualifications
Outstanding research emerges from driven, talented minds. For this project, we are looking for the following attributes:
- Demonstrated prior research experience, evidenced by published work in peer-reviewed conferences, journals, or recognized preprint platforms in ML/AI, particularly in areas related to model interpretability, calibration, or uncertainty quantification.
- Strong proficiency in Python, with specific expertise in deep learning frameworks and libraries including PyTorch, Transformers, TRL (Transformer Reinforcement Learning), and TransformerLens for model analysis and intervention.
- Solid foundation in probability theory, linear algebra, and optimization methods, with hands-on experience in model fine-tuning techniques including LoRA/QLoRA and other parameter-efficient methods.
- A basic understanding of common architectures and paradigms used in Mechanistic Interpretability - like SAEs, Dictionary Learning.
- Deep understanding of reinforcement learning from human feedback (RLHF) paradigms and the distinctions between various fine-tuning approaches (SFT vs DPO vs GRPO vs PPO), including when and why each method is most appropriate.
- Deep understanding of the end-to-end training process of LLMs and VLMs - to analyze each step to potentially point out causes of mis-calibration/overconfidence in the models.
Expectations
There are a few key expectations and clarifications we need to emphasize regarding the OS Research Team:
- Contribute approximately 10 hours per week to ensure meaningful progress and deep engagement with our projects. Flexibility around life commitments is understood; clear, proactive communication helps us support each other.
- Our working language is English, and a strong proficiency is required to clearly communicate technical concepts without confusion or misunderstanding.
- This is a volunteer effort; none of us receive compensation of any kind—including monetary payment, academic credit, or other formal incentives. Our commitment is driven entirely by shared passion for impactful research.
- More information on OS Research Team expectations is available here.
We look forward to seeing your application!