Running large language models (LLMs) locally can be super helpful—whether you’d like to play around with LLMs or build more powerful apps using them. But configuring your working environment and getting LLMs to run on your machine is not trivial.
How do you run LLMs locally without any of the hassles?
Enter Ollama, a platform that breezes local development with open-source large language models. With Ollama, everything you need to run an LLM—model weights and all of the config—is packaged into a single Modelfile. Think Docker for LLMs.
In this tutorial, we’ll look at how to get started with Ollama to run large language models locally. So let’s get right into the steps!
Step 1: Download Ollama to Get Started
As a first step, you should download Ollama to your machine. Ollama is supported on all major platforms: MacOS, Windows, and Linux.
To download Ollama, you can either visit the official GitHub repo or follow the download links from there. Or visit the official website and download the installer if you are on a Mac or a Windows machine.
I’m on Linux: Ubuntu distro. So if you’re a Linux user like me, you can run the following command to run the installer script:
Step 2: Get the Model
Next, you can visit the model library to check the list of all model families currently supported. The default model downloaded is the one with the latest tag. On the page for each model, you can get more info such as the size and quantization used.
You can search through the list of tags to locate the model that you want to run. For each model family, there are typically foundational models of different sizes and instruction-tuned variants. I’m interested in running the Gemma 2B model from the Gemma family of lightweight models from Google DeepMind.
You can run the model using the ollama run command to pull and start interacting with the model directly. However, you can also pull the model onto your machine first and then run it. This is very similar to how you work with Docker images.
For Gemma 2B, running the following pull command downloads the model onto your machine:
Step 3: Run the Model
Run the model using the ollama run command as shown:
Doing so will start an Ollama REPL at which you can interact with the Gemma 2B model. Here’s an example:
For a simple question about the Python standard library, the response seems pretty okay. And includes the most frequently used modules.
You can customize LLMs by setting system prompts for a specific desired behavior like so:
- Set system prompt for desired behavior.
- Save the model by giving it a name.
- Exit the REPL and run the model you just created.
Say you want the model to always explain concepts or answer questions in plain English with as minimal technical jargon as possible. Here’s how you can go about doing it:
Step 4: Use Ollama with Python
Running the Ollama command-line client and interacting with LLMs locally at the Ollama REPL is a good start. But often you would want to use LLMs in your applications. You can run Ollama as a server on your machine and run cURL requests.
But there are simpler ways. If you like using Python, you’d want to build LLM apps, and here are a couple of ways you can do it:
- Using the official Ollama Python library
- Using Ollama with LangChain
Pull the models you need to use before you run the snippets in the following sections.
Using the Ollama Python Library
You can use the Ollama Python library you can install it using pip like so:
$ pip install Ollama
There is an official JavaScript library too, which you can use if you prefer developing with JS.
Once you install the Ollama Python library, you can import it into your Python application and work with large language models. Here’s the snippet for a simple language generation task:
import Ollama
Using LangChain
Another way to use Ollama with Python is using LangChain. If you have existing projects using LangChain it’s easy to integrate or switch to Ollama.
Make sure you have LangChain installed. If not, install it using pip:
$ pip install langchain
Wrapping Up
With Ollama you can run large language models locally and build LLM-powered apps with just a few lines of Python code. Here we explored how to interact with LLMs at the Ollama REPL as well as from within Python applications.
Next, we’ll try building an app using Ollama and Python. Until then, if you’re looking to dive deep into LLMs check out
7 Steps to Mastering Large Language Models (LLMs).
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.
- A Simple Guide to Running LlaMA 2 Locally
- Pedantic Tutorial: Data Validation in Python Made Simple
- Combining Pandas DataFrames Made Simple
- Personalized AI Made Simple: Your No-Code Guide to Adapting GPTs
- Distribute and Run LLMs with llama-file in 5 Simple Steps
- Run an LLM Locally with LM Studio
OLLAMA is a cutting-edge platform designed to run open-source large language models locally on your machine. It takes the complexity out of the equation by bundling model weights, configuration, and data into a single package defined by a Model file.
Enter Ollama, a platform that makes local development with open-source large language models a breeze. With Ollama, everything you need to run an LLM—model weights and all of the config—is packaged into a single Modelfile. Think Docker for LLMs.
Running large language models (LLMs) locally can be super helpful—whether you’d like to play around with LLMs or build more powerful apps using them.Ollama supports a wide range of models, including Lama 2, Lama 2 uncensored, and the newly released Mistal 7B, among others. GitHub — ollama/ollama: Get up and running with Llama 2, Mistral, and other large language models.
Ollama is a framework that allows you to run large language models (LLMs) on your local machine. This is in contrast to most LLMs, which are run on powerful servers in the cloud. Installation shouldn’t take more than 2–3 minutes, depending on your CPU, GPU, and RAM.
Once installation is complete, click ‘Finish’ and search for Ollama in Windows Start Programs.
Installation shouldn’t take more than 2–3 minutes, depending on your CPU, GPU, and RAM.
Once installation is complete, click ‘Finish’ and search for Ollama in Windows Start Programs.
However, for this blog, we’ll focus on setting up Ollama in a Windows 10 environment.
To begin, follow these steps:”
Once the download is completed, simply double-click on the OllamaSetup.exe file.
It will start the Ollama setup wizard, and you will be greeted with a screen like the one below:
Now that you have everything up and running, let’s download and deploy our first LLM models.