*November 5, 2023* # Controlling Vim With ChatGPT Vim is a text based text editor. Every piece of information you can get from it is represented with text, and every piece of information you can send to it is also text. ChatGPT et al are quite knowledgeable about vim and have taught me a lot of tricks. Since vim is all text, maybe we can cut out the middle man? [code available here](https://github.com/LachlanGray/vim-agent) ## A Match Made in Heaven Vim is very weird. Among other things, it is designed in a way that anything you know how to do with the editor, you \*also\* know how to automate with the editor. In a nutshell, every interaction you have with the editor can be turned into a vimscript program. A sequence involving any combination of typing, altering, deleting, editor commands, shell commands, opening files, etc, can be done by a program. Whether this is good for humans can be debated, but this feature is priceless if you happen to be a language model. Vimscript has existed for a really long time now, and the internet is full of vim scripts to accomplish all sorts of boring things. Vimscript is basically domain-specific programming language that kicks ass at manipulating text. It takes some flak due to its age and rigidity, but it's truly excellent at the one thing it's meant to do. What this means for us, is that ChatGPT is fluent, and in theory, can do anything I can do with the editor. ## Neovim API How will ChatGPT actually control the editor? Conveniently, neovim has a [python client](https://pynvim.readthedocs.io/en/latest/) that allows you to control the editor with a python API. This is lovely. To start, you can launch neovim with a listening address via ```sh nvim --listen 127.0.0.1:7777 ``` which starts a regular old neovim instance, only it will listen for connections. Now, from the python API, we can tell neovim to do things. The python API connects to neovim using TCP (transmission control protocol) which is a network protocol designed to ensure data is received in full, and arrives in the same order it was sent. We create a `vim` object to manage the state of neovim: ```python vim = attach('tcp', address='127.0.0.1', port=7777) ``` The `vim` object gives us full access to neovim's vim API which offers programmatic access to the editor's state and functionalities. Among many nice wrappers, it offers `vim.request()`, which is an all in one hammer that works with everything in the api docs (linked [here](https://neovim.io/doc/user/api.html), or `:h api` in neovim). It lets us do pretty much anything, for example run a command: ```python vim.request('nvim_command', "q!") ``` Here's the game plan: wrap all of the functionalities we care about, and simplify them so that they will be easy to interface with the language model. We'll call the wrapper `VimInstance`, which will hold the `vim` object, and will have simple properties to access information, and methods to control the vim instance: ```python class VimInstance: def __init__(self): self.vim = attach('tcp', address='127.0.0.1', port=7777) @property def current_buffer_content(self): return self.vim.request( 'nvim_buf_get_lines', self.vim.current.buffer, 0, -1, True) # ... def input(self, keys): self.vim.input(keys) # ... ``` Then we can do things like ```python vim = VimInstance() # list containing each line of current file text = vim.current_buffer_content # type "hello" at the start of the file and save vim.input("ggIhello<Esc>:w<CR>") ``` ## Give Me The CODE! One of life's great certainties is that ChatGPT will tell you what it's doing. Like, a lot. Even if you ask it not to, it usually gets something in. We are sure to run into this problem instantly. Instead of cajoling the model, it can be easier to just listen until there's code, and hang up when the code is done. The streaming feature of OpenAI's chat completion endpoint is useful to this end. It allows us to monitor chunks of text as they arrive and decide in real time what to do with them. We can iterate over completions as they arrive by `yield`ing chunks like this: ```python import openai def chat_3(messages: list[dict]): completion = openai.ChatCompletion.create( model = "gpt-3.5-turbo", messages = messages, temperature = 0.9, stream=True ) for chunk in completion: if "content" in chunk.choices[0].delta: yield chunk.choices[0].delta["content"] ``` Now we can process the model's response without waiting for it to finish: ```python def filtered_chat(request: str): messages = [{"role":"user"}, {"content": request}] for chunk in chat_3(messages): if <condition>: yield chunk ``` To collect the code, we can monitor the response as it rolls in, and wait for the opening code block pattern (` ```.*?(\n.*)`) to appear. Once there is a match, we know it's time to listen. We can then keep listening until we hit more backticks. If all went well, we should only have code. ## GPT Take The Wheel Now we have everything we need to plug ChatGPT into the editor. For the basic setup, let's tell ChatGPT what to do. A basic prompting strategy works like this: - Give an instruction, and provide the screen content as context - Request *specifically* for a vimscript to accomplish the task - Execute each line in neovim Here's the result. The left side is neovim with a file open, and I am typing instructions on the right. It... kind of works: ![[vim agent demo.mp4]] It's pretty good with basic things like deleting and re-arranging stuff on the screen, but it's unreliable with more advanced requests, like converting comments to piglatin. The good news is that there are like a million things that could be done better, so things look promising for automatic text editing! The next step is to work out a better prompting strategy, instead of this one-size-fits-all make a script thing. Once we have a good set of primitives, I want to build a [voyager](https://github.com/MineDojo/Voyager)-like agent to see how far it can go.