OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.
Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.
Join us on Discord | Documentation | OpenAdapt.ai
OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:
| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-desktop |
Desktop GUI application | openadapt-desktop |
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
Install what you need:
pip install openadapt # Minimal CLI only
pip install openadapt[capture] # GUI capture/recording
pip install openadapt[ml] # ML training and inference
pip install openadapt[evals] # Benchmark evaluation
pip install openadapt[privacy] # PII/PHI scrubbing
pip install openadapt[all] # EverythingRequirements: Python 3.10+
openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stopopenadapt train start --capture my-task --model qwen3vl-2bopenadapt eval run --checkpoint training_output/model.pt --benchmark waaopenadapt capture view my-task| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
| Package | Description | Repository |
|---|---|---|
openadapt-desktop |
Desktop GUI application | openadapt-desktop |
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
openadapt capture start --name <name> Start recording
openadapt capture stop Stop recording
openadapt capture list List captures
openadapt capture view <name> Open capture viewer
openadapt train start --capture <name> Train model on capture
openadapt train status Check training progress
openadapt train stop Stop training
openadapt eval run --checkpoint <path> Evaluate trained model
openadapt eval run --agent api-claude Evaluate API agent
openadapt eval mock --tasks 10 Run mock evaluation
openadapt serve --port 8080 Start dashboard server
openadapt version Show installed versions
openadapt doctor Check system requirements
See the full Architecture Evolution for detailed documentation.
OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:
1. DEMONSTRATE (Observation Collection)
- Capture: Record user actions and screenshots with
openadapt-capture - Privacy: Scrub PII/PHI from recordings with
openadapt-privacy - Store: Build a searchable demonstration library
2. LEARN (Policy Acquisition)
- Retrieval Path: Embed demonstrations, index them, and enable semantic search
- Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
- Abstraction: Progress from literal replay to template-based automation
3. EXECUTE (Agent Deployment)
- Observe: Take screenshots and gather accessibility information
- Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
- Ground: Map intentions to specific UI coordinates with
openadapt-grounding - Act: Execute validated actions with safety gates
- Evaluate: Measure success with
openadapt-evalsand feed results back for improvement
Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."
| No Retrieval | With Retrieval | |
|---|---|---|
| No Fine-tuning | 46.7% (zero-shot baseline) | 100% first-action (n=45, shared entry point) |
| Fine-tuning | Standard SFT (baseline) | Demo-conditioned FT (planned) |
The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.
Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.
Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.
- Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
- Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
- Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
- Evaluation-Driven Feedback: Success traces become new training data
| Term | Description |
|---|---|
| Observation | What the agent perceives (screenshot, accessibility tree) |
| Action | What the agent does (click, type, scroll, etc.) |
| Trajectory | Sequence of observation-action pairs |
| Demonstration | Human-provided example trajectory |
| Policy | Decision-making component that maps observations to actions |
| Grounding | Mapping intent to specific UI elements (coordinates) |
Legacy Version (v0.46.0) Examples:
- Twitter Demo - Early OpenAdapt demonstration
- Loom Video - Process automation walkthrough
Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the documentation.
macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.
Windows: Run as Administrator if needed for input capture.
The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.
To use the legacy version:
pip install openadapt==0.46.0See docs/LEGACY_FREEZE.md for migration guide and details.
- Join Discord
- Pick an issue from the relevant sub-package repository
- Submit a PR
For sub-package development:
git clone http://github.com/OpenAdaptAI/openadapt-ml # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"- OpenAdaptAI/SoM - Set-of-Mark prompting
- OpenAdaptAI/pynput - Input monitoring fork
- OpenAdaptAI/atomacos - macOS accessibility
- Discord: https://discord.gg/yF527cQbDG
- Issues: Use the relevant sub-package repository
- Architecture docs: GitHub Wiki
MIT License - see LICENSE for details.