Arun Pandian M

Android Dev | Full-Stack & AI Learner

Jun 7, 2026

Written by: Arun Pandian M•Published on: Jun 7, 2026

Understanding Ollama: Installing, Managing, and Running Local AI Models

One of the biggest misconceptions beginners have when learning AI engineering is thinking that an AI model is the same thing as Ollama. They are not. Think about Java development.

Java Code
    ↓
    JVM
    ↓
 Execution

The JVM runs Java applications.

Similarly:

AI Model
    ↓
 Ollama
    ↓
 Execution

Ollama is a runtime that allows us to download, manage, and run Large Language Models (LLMs) locally on our machine.

This means we can build AI applications without relying on cloud APIs or paying per request.

https://storage.googleapis.com/lambdabricks-cd393.firebasestorage.app/img_ollama_manage.svg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=firebase-adminsdk-fbsvc%40lambdabricks-cd393.iam.gserviceaccount.com%2F20260722%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20260722T192042Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host&X-Goog-Signature=5a5f485de8f6a6525909b595e84d3c1174489ae816f7e363ecf6100c5da6d4861eb95ce6645e9549ae5c5f97e736ab5a5d0a842e6a80777f1c21973ca4655d7bd407bc07ec81c21f68ffd41b167a4b7eefc2ab62f556fe028cd7fe18946380757d05412e528882f781228d472bdd58e9d68a7270edc4e3c883fad5128bd0b433bd4665636b6095cbeffb9cbd9c0deea3935dcfae0be16c87aee8936b2dae198134c6ac4221bf24143d1fae78bb9d37bd5b2e97535a7bdd8469e55ec48a044b8dc9a0b093d972c9fd6a46e96b10f627b64ecd4d78e6239db50a38920831d79ccb894c6f13bd199bf79826f435935c14071cf35b6ec673be95a5d71c43ad97207b

Installing Ollama

The first step is installing Ollama.

For macOS:

brew install ollama

Verify the installation:

ollama --version

Example:

Warning: could not connect to a running Ollama instance
Warning: client version is 0.18.3

This simply means Ollama is installed, but the Ollama server is not currently running.

Starting the Ollama Server

Before we can use any model, we must start the Ollama server.

ollama serve

Expected output:

Listening on 127.0.0.1:11434

This means Ollama is now listening for requests from applications.

Conceptually:

Python App ↓ localhost:11434 ↓ Ollama Server

Keep this terminal open. The server must remain running while we use models.

Installing a Model

A fresh Ollama installation contains no models. We need to download one.

For example:

ollama pull qwen3:4b

What happens?

Ollama Registry
        ↓
 Download Model
        ↓
 Store On Disk

After the download completes, the model is available locally. A common mistake is assuming that downloading a model means it is running. It is not.

Think:

Movie Downloaded
≠
Movie Playing

Installing and running are different operations.

Viewing Installed Models

To see all downloaded models:

ollama list

Example:

NAME
qwen3:4b
phi3:mini
nomic-embed-text

These models exist on your SSD. They are not necessarily consuming RAM.

Running a Model

To start a model:

ollama run qwen3:4b

What happens internally?

SSD
 ↓
Load Model Into RAM
 ↓
Start Inference
 ↓
Ready For Questions

You can now ask:

What is Kotlin?

the model generates an answer. This process is called:

Inference

Inference is the act of using a trained model. Most AI application engineers perform inference rather than training models.

Viewing Running Models

To see which models are currently loaded into memory:

ollama ps

Example:

NAME
qwen3:4b

This command is different from:

ollama list

Remember:

ollama list
=
Installed Models

ollama ps
=
Running Models

This distinction is important.

Stopping a Model

When you’re finished using a model:

ollama stop qwen3:4b

What happens?

RAM
 ↓
Unload Model
 ↓
Free Memory

The model remains installed and can be started again later.

Removing a Model

If you no longer need a model:

ollama rm qwen3:4b

What happens?

Delete Model Files
       ↓
 Free Disk Space

The model must be downloaded again before it can be used.

A Typical AI Engineering Workflow

A common workflow looks like this:

Start Ollama:

ollama serve

Download a model:

ollama pull qwen3:4b

Verify installation:

ollama list

Run the model:

ollama run qwen3:4b

Check running models:

ollama ps

Stop the model:

ollama stop qwen3:4b

Remove the model:

ollama rm qwen3:4b

Mental Model

Whenever you work with Ollama, think about three layers:

Storage Layer
│
├── qwen3:4b
├── phi3:mini
└── nomic-embed-text

        ↓

Memory Layer
│
└── Running Models

        ↓

Application Layer
│
├── Python
├── CLI Tools
├── Agents
└── AI Applications

Understanding these layers removes much of the mystery around local AI.

Once you know how to install, manage, run, and stop models, you’re ready to start building real AI applications on top of them.

#MachineLearning#SoftwareEngineering#BuildInPublic#LocalLLM#SelfHostedAI#TechLearning#ArtificialIntelligence#AIEngineering#AIAgents#LLM#GenerativeAI#LargeLanguageModels#Ollama#Inference#OpenSourceAI#Python#PromptEngineering#AIApplications#DeveloperTools#AIInfrastructure

← PreviousUnderstanding LLMs, Ollama, and Inference Next →Basic Interaction with LLMs — The Concepts Every AI Engineer Must Learn First

Recommended for you

Basic Interaction with LLMs — The Concepts Every AI Engineer Must Learn First

1 min read

Understanding LLMs, Ollama, and Inference

1 min read

LB LAMBDA BRICKS