Run Large Language Models on Your Computer with Ease: A Look at Ollama

Running large language models (LLMs) locally can be super helpful—whether you’d like to play around with LLMs or build more powerful apps using them. But configuring your working environment and getting LLMs to run on your machine is not trivial.

Contents

How do you run LLMs locally without any of the hassles?

Step 1: Download Ollama to Get Started

Step 2: Get the Model

Step 3: Run the Model

Step 4: Use Ollama with Python

Using the Ollama Python Library

Using LangChain

Wrapping Up

How do you run LLMs locally without any of the hassles?

Enter Ollama, a platform that breezes local development with open-source large language models. With Ollama, everything you need to run an LLM—model weights and all of the config—is packaged into a single Modelfile. Think Docker for LLMs.

In this tutorial, we’ll look at how to get started with Ollama to run large language models locally. So let’s get right into the steps!

Step 1: Download Ollama to Get Started

As a first step, you should download Ollama to your machine. Ollama is supported on all major platforms: MacOS, Windows, and Linux.

To download Ollama, you can either visit the official GitHub repo or follow the download links from there. Or visit the official website and download the installer if you are on a Mac or a Windows machine.

I’m on Linux: Ubuntu distro. So if you’re a Linux user like me, you can run the following command to run the installer script:

Step 2: Get the Model

Next, you can visit the model library to check the list of all model families currently supported. The default model downloaded is the one with the latest tag. On the page for each model, you can get more info such as the size and quantization used.

You can search through the list of tags to locate the model that you want to run. For each model family, there are typically foundational models of different sizes and instruction-tuned variants. I’m interested in running the Gemma 2B model from the Gemma family of lightweight models from Google DeepMind.

You can run the model using the ollama run command to pull and start interacting with the model directly. However, you can also pull the model onto your machine first and then run it. This is very similar to how you work with Docker images.

For Gemma 2B, running the following pull command downloads the model onto your machine:

Step 3: Run the Model

Run the model using the ollama run command as shown:

Doing so will start an Ollama REPL at which you can interact with the Gemma 2B model. Here’s an example:

For a simple question about the Python standard library, the response seems pretty okay. And includes the most frequently used modules.

You can customize LLMs by setting system prompts for a specific desired behavior like so:

Set system prompt for desired behavior.
Save the model by giving it a name.
Exit the REPL and run the model you just created.

Say you want the model to always explain concepts or answer questions in plain English with as minimal technical jargon as possible. Here’s how you can go about doing it:

Step 4: Use Ollama with Python

Running the Ollama command-line client and interacting with LLMs locally at the Ollama REPL is a good start. But often you would want to use LLMs in your applications. You can run Ollama as a server on your machine and run cURL requests.

But there are simpler ways. If you like using Python, you’d want to build LLM apps, and here are a couple of ways you can do it:

Using the official Ollama Python library
Using Ollama with LangChain

Pull the models you need to use before you run the snippets in the following sections.

Using the Ollama Python Library

You can use the Ollama Python library you can install it using pip like so:

$ pip install Ollama

There is an official JavaScript library too, which you can use if you prefer developing with JS.

Once you install the Ollama Python library, you can import it into your Python application and work with large language models. Here’s the snippet for a simple language generation task:

import Ollama

Using LangChain

Another way to use Ollama with Python is using LangChain. If you have existing projects using LangChain it’s easy to integrate or switch to Ollama.

Make sure you have LangChain installed. If not, install it using pip:

$ pip install langchain

Wrapping Up

With Ollama you can run large language models locally and build LLM-powered apps with just a few lines of Python code. Here we explored how to interact with LLMs at the Ollama REPL as well as from within Python applications.

Next, we’ll try building an app using Ollama and Python. Until then, if you’re looking to dive deep into LLMs check out

7 Steps to Mastering Large Language Models (LLMs).

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.