GitPedia

Open Interface

Control Any Computer Using LLMs.

From AmberSahdevยทUpdated June 14, 2026ยทView on GitHubยท

Open Interface - Self-drives your computer by sending your requests to an LLM backend (GPT-4o, Gemini, etc) to figure out the required steps. - Automatically executes these steps by simulating keyboard and mouse input. - Course-corrects by sending the LLM backend updated screenshots of the progress as needed. The project is written primarily in Python, distributed under the GNU General Public License v3.0 license, first published in 2024. It has gained significant community traction with 2,688 stars and 273 forks on GitHub. Key topics include: assistant, assistant-computer-control, automation, gpt, gpt4.

Latest release: v0.9.0โ€” 0.9.0

Open Interface

<picture> <img src="assets/icon.png" align="right" alt="Open Interface Logo" width="120" height="120"> </picture>

Control Your Computer Using LLMs

Open Interface

  • Self-drives your computer by sending your requests to an LLM backend (GPT-4o, Gemini, etc) to figure out the required steps.
  • Automatically executes these steps by simulating keyboard and mouse input.
  • Course-corrects by sending the LLM backend updated screenshots of the progress as needed.
<div align="center"> <h4>Full Autopilot for All Computers Using LLMs</h4>

macOS
Linux
Windows
<br>
Github All Releases
GitHub code size in bytes
GitHub Repo stars
GitHub
GitHub Latest Release)

</div>

<ins>Demo</ins> ๐Ÿ’ป

"Solve Today's Wordle"<br>
Solve Today's Wordle<br>
clipped, 2x

<details> <summary><a href="https://github.com/AmberSahdev/Open-Interface/blob/main/MEDIA.md#demos">More Demos</a></summary> <ul> <li> "Make me a meal plan in Google Docs" <img src="assets/meal_plan_demo_2x.gif" style="margin: 5px; border-radius: 10px;"> </li> <li> "Write a Web App" <img src="assets/code_web_app_demo_2x.gif" style="margin: 5px; border-radius: 10px;"> </li> </ul> </details> <hr>

<ins>Install</ins> ๐Ÿ’ฝ

<details> <summary><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Apple_Computer_Logo_rainbow.svg/960px-Apple_Computer_Logo_rainbow.svg.png?20250629104313" alt="MacOS Logo" width="13" height="15"> <b>MacOS</b></summary> <ul> <li>Download the MacOS binary from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li> <li>Unzip the file and move Open Interface to the Applications Folder.<br><br> <img src="assets/macos_unzip_move_to_applications.png" width="350" style="border-radius: 10px; border: 3px solid black;"> </li> </ul> <details> <summary><b>Apple Silicon M-Series Macs</b></summary> <ul> <li> Open Interface will ask you for Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.<br> </li> <li> In case it doesn't, manually add these permission via <b>System Settings</b> -> <b>Privacy and Security</b> <br> <img src="assets/mac_m3_accessibility.png" width="400" style="margin: 5px; border-radius: 10px; border: 3px solid black;"><br> <img src="assets/mac_m3_screenrecording.png" width="400" style="margin: 5px; border-radius: 10px; border: 3px solid black;"> </li> </ul> </details> <details> <summary><b>Intel Macs</b></summary> <ul> <li> Launch the app from the Applications folder.<br> You might face the standard Mac <i>"Open Interface cannot be opened" error</i>.<br><br> <img src="assets/macos_unverified_developer.png" width="200" style="border-radius: 10px; border: 3px solid black;"><br> In that case, press <b><i><ins>"Cancel"</ins></i></b>.<br> Then go to <b>System Preferences -> Security and Privacy -> Open Anyway.</b><br><br> <img src="assets/macos_system_preferences.png" width="100" style="border-radius: 10px; border: 3px solid black;"> &nbsp; <img src="assets/macos_security.png" width="100" style="border-radius: 10px; border: 3px solid black;"> &nbsp; <img src="assets/macos_open_anyway.png" width="400" style="border-radius: 10px; border: 3px solid black;"> </li> <br> <li> Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.<br><br> <img src="assets/macos_accessibility.png" width="400" style="margin: 5px; border-radius: 10px; border: 3px solid black;"><br> <img src="assets/macos_screen_recording.png" width="400" style="margin: 5px; border-radius: 10px; border: 3px solid black;"> </li> </ul> </details> <ul> <li>Lastly, checkout the <a href="#setup">Setup</a> section to connect Open Interface to LLMs (OpenAI GPT-4V)</li> </ul> </details> <details> <summary><img src="https://upload.wikimedia.org/wikipedia/commons/a/af/Tux.png" alt="Linux Logo" width="18" height="18"> <b>Linux</b></summary> <ul> <li>Linux binary has been tested on Ubuntu 20.04 so far.</li> <li>Download the Linux zip file from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li> <li> Extract the executable and checkout the <a href="https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#setup">Setup</a> section to connect Open Interface to LLMs, such as OpenAI GPT-4V.</li> </ul> </details> <details> <summary><img src="https://upload.wikimedia.org/wikipedia/commons/5/5f/Windows_logo_-_2012.svg" alt="Linux Logo" width="15" height="15"> <b>Windows</b></summary> <ul> <li>Windows binary has been tested on Windows 10.</li> <li>Download the Windows zip file from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li> <li>Unzip the folder, move the exe to the desired location, double click to open, and voila.</li> <li>Checkout the <a href="https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#setup">Setup</a> section to connect Open Interface to LLMs (OpenAI GPT-4V)</li> </ul> </details> <details> <summary><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/120px-Python-logo-notext.svg.png?20250701090410" alt="Python Logo" width="15" height="15"> <b>Run as a Script</b></summary> <ul> <li>Clone the repo <code>git clone https://github.com/AmberSahdev/Open-Interface.git</code></li> <li>Enter the directory <code>cd Open-Interface</code></li> <li><b>Optionally</b> use a Python virtual environment <ul> <li>Note: pyenv handles tkinter installation weirdly so you may have to debug for your own system yourself.</li> <li><code>pyenv local 3.12.2</code></li> <li><code>python -m venv .venv</code></li> <li><code>source .venv/bin/activate</code></li> </ul> </li> <li>Install dependencies <code>pip install -r requirements.txt</code></li> <li>Run the app using <code>python app/app.py</code></li> </ul> </details>

<ins id="setup">Setup</ins> ๐Ÿ› ๏ธ

<details> <summary><b>Set up the OpenAI API key</b></summary>
  • Get your OpenAI API key

  • Save the API key in Open Interface settings

    • In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so: <br> <br>
    <picture> <img src="assets/set_openai_api_key.png" align="middle" alt="Set API key in settings" width="400"> </picture><br> <br>
  • After setting the API key for the first time you'll need to <b>restart the app</b>.

</details> <details> <summary><b>Set up the Google Gemini API key</b></summary>
  • Go to Settings -> Advanced Settings and select the Gemini model you wish to use.
  • Get your Google Gemini API key from https://aistudio.google.com/app/apikey.
  • Save the API key in Open Interface settings.
  • Save the settings and <b>restart the app</b>.
</details> <details> <summary><b>Optional: Setup a Custom LLM</b></summary>
  • Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.
  • Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed.
  • NOTE - If you're using Llama:
    • You may need to enter a random string like "xxx" in the API key input box.
    • You may need to append /v1/ to the base URL.
      <br>
      <picture>
      <img src="assets/advanced_settings.png" align="middle" alt="Set API key in settings" width="400">
      </picture><br>
      <br>
  • If your LLM does not support an OpenAI style API, you can use a library like this to convert it to one.
  • You will need to restart the app after these changes.
</details> <hr>

<ins>Stuff Itโ€™s Error-Prone At, For Now</ins> ๐Ÿ˜ฌ

  • Accurate spatial-reasoning and hence clicking buttons.
  • Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.
  • Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.

<ins>The Future</ins> ๐Ÿ”ฎ

(with better models trained on video walkthroughs like Youtube tutorials)

  • "Create a couple of bass samples for me in Garage Band for my latest project."
  • "Read this design document for a new feature, edit the code on Github, and submit it for review."
  • "Find my friends' music taste from Spotify and create a party playlist for tonight's event."
  • "Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie."

<ins>Notes</ins> ๐Ÿ“

  • Cost Estimation: $0.0005 - $0.002 per LLM request depending on the model used.<br>
    (User requests can require between two to a few dozen LLM backend calls depending on the request's complexity.)
  • You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.
  • Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.
<hr>

<ins>System Diagram</ins> ๐Ÿ–ผ๏ธ

+----------------------------------------------------+
| App                                                |
|                                                    |
|    +-------+                                       |
|    |  GUI  |                                       |
|    +-------+                                       |
|        ^                                           |
|        |                                           |
|        v                                           |
|  +-----------+  (Screenshot + Goal)  +-----------+ |
|  |           | --------------------> |           | |
|  |    Core   |                       |    LLM    | |
|  |           | <-------------------- |  (GPT-4o) | |
|  +-----------+    (Instructions)     +-----------+ |
|        |                                           |
|        v                                           |
|  +-------------+                                   |
|  | Interpreter |                                   |
|  +-------------+                                   |
|        |                                           |
|        v                                           |
|  +-------------+                                   |
|  |   Executer  |                                   |
|  +-------------+                                   |
+----------------------------------------------------+

<ins>Star History</ins> โญ๏ธ

<picture> <img src="https://api.star-history.com/svg?repos=AmberSahdev/Open-Interface&type=Date" alt="Star History" width="720"> </picture>
  • Check out more of my projects at AmberSah.dev.
  • Other demos and press kit can be found at MEDIA.md.
<div align="center"> <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/AmberSahdev/Open-Interface"> <a href="https://github.com/AmberSahdev"> <img alt="GitHub followers" src="https://img.shields.io/github/followers/AmberSahdev"> </a> </div>

Contributors

Showing top 7 contributors by commit count.

View all contributors on GitHub โ†’

This article is auto-generated from AmberSahdev/Open-Interface via the GitHub API.Last fetched: 6/16/2026