How to Run ChatGPT-like LLMs Locally on Your Computer in 3 Easy Steps
A Step-by-Step Tutorial for using LLaVA 1.5 and Mistral 7B on your Mac or Windows
Running Large Language Models (LLMs) similar to ChatGPT locally on your computer and without Internet connection is now more straightforward, thanks to llamafile, a tool developed by Justine Tunney of the Mozilla Internet Ecosystem (MIECO) and Mozilla's innovation group. Llamafile is a game-changer in the world of LLMs, enabling you to run these models locally with ease.
In this post, I’ll show you how to run locally on your Mac LLaVA 1.5, an open-source multimodal LLM capable of handling both text and image inputs, or Mistral 7B, an open-source LLM known for its advanced natural language processing and efficient text generation, leveraging llamafile.
What is llamafile?
Llamafile transforms LLM weights into executable binaries. This technology essentially packages both the model weights and the necessary code required to run an LLM into a single, multi-gigabyte file. This file includes everything needed to run the model, and in some cases, it also contains a full local server with a web UI for interaction. This approach simplifies the process of distributing and running LLMs on multiple operating systems and hardware architectures, thanks to its compilation using Cosmopolitan Libc.
This innovative approach simplifies the distribution and execution of LLMs, making it much more accessible for users to run these models locally on their own computers.
What is LLaVA 1.5?
LLaVA 1.5 is an open-source large multimodal model that supports text and image inputs, similar to GPT-4 Vision. It is trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
What is Mistral 7B?
Mistral 7B is an open-source large language model with 7.3 billion parameters developed by Mistral AI. It excels in generating coherent text and performing various NLP tasks. Its unique sliding window attention mechanism allows for faster inference and handling of longer text sequences. Notable for its fine-tuning capabilities, Mistral 7B can be adapted to specific tasks, and it has shown impressive performance in benchmarks, outperforming many similar models.
Here’s how to start using LLaVA 1.5 or Mistral 7B on your own computer leveraging llamafile. Don’t get intimidated, the setup process is very straightforward!
Setting Up LLaVA 1.5
One Time Setup
Open Terminal: Before beginning, you need to open the Terminal application on your computer. On a Mac, you can find it in the Utilities folder within the Applications folder, or you can use Spotlight (Cmd + Space) to search for "Terminal."
Download the LLaVA 1.5 llamafile: Pick your preferred option to download the llamafile for LLaVA 1.5 (around 4.26GB):
Go to Justine's repository of LLaVA 1.5 on Hugging Face and click
download
or just click here and the download should start directly.Use this command in the Terminal:
curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile
Make the Binary Executable: Once downloaded, use the Terminal to navigate to the folder where the file was downloaded, e.g. Downloads, and make the binary executable:
cd ~/Downloads chmod 755 llava-v1.5-7b-q4-server.llamafile
For Windows, simply add
.exe
at the end of the file name.
Using LLaVA 1.5
Every time you want to use LLaVA on your compute follow these steps:
Run the Executable: Start the web server by executing the binary1:
./llava-v1.5-7b-q4-server.llamafile
This command will launch a web server on port 8080.
Access the Web UI: To start using the model, open your web browser and navigate to http://127.0.0.1:8080/ (or click the link to open directly).
Terminating the process
Once you're done using the LLaVA 1.5 model, you can terminate the process. To do this, return to the Terminal where the server is running. Simply press Ctrl + C
. This key combination sends an interrupt signal to the running server, effectively stopping it.
Setting Up Mistral 7B
One Time Setup
Open Terminal
Download the Mistral 7B llamafile: Pick your preferred option to download the llamafile for Mistral 7B (around 4.37 GB):
Go to Justine's repository of Mistral 7B on Hugging Face and click
download
or just click here and the download should start directly.Use this command in the Terminal:
curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
Make the Binary Executable: Once downloaded, use the Terminal to navigate to the folder where the file was downloaded, e.g. Downloads, and make the binary executable:
cd ~/Downloads chmod 755 mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
For Windows, simply add
.exe
at the end of the file name.
Using Mistral 7B
Every time you want to use LLaVA on your compute follow these steps:
Run the Executable: Start the web server by executing the binary:
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
This command will launch a web server on port 8080.
Access the Web UI: To start using the model, open your web browser and navigate to http://127.0.0.1:8080/ (or click the link to open directly).
Terminating the process
Once you're done using the Mistral 7B model, you can terminate the process. To do this, return to the Terminal where the server is running. Simply press Ctrl + C
. This key combination sends an interrupt signal to the running server, effectively stopping it.
Conclusion
The introduction of llamafile significantly simplifies the deployment and use of advanced LLMs like LLaVA 1.5 or Mistral 7B for personal, development, or research purposes. This tool opens up new possibilities in the realm of AI and machine learning, making it more accessible for a wider range of users.
The first time only, you might be asked to install the command line developer tools; just click on Install:
Attempted to Mistral working. Receive the following error. I'm on windows and have added .exe to the command lines:
kenba@DESKTOP-UU7PDAG /cygdrive/d
$ ./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile.exe
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile.exe: line 11: /home/kenba/.ape-1.9: cannot execute binary file: Exec format error
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile.exe: line 11: /home/kenba/.ape-1.9: No error
I used BARD to help me to get this to work. It is now working and looking at photos. Are there plans to make this true multi modal that supports video and audio streams? This is the holy grail of open source AI. I'm still playing with this and it is impressive for an open source LLM that can run on my alienware. Thanks for posting.