Arun Pandian M

Arun Pandian M

Android Dev | Full-Stack & AI Learner

Written by: Arun Pandian MPublished on: Jun 5, 2026

Understanding LLMs, Ollama, and Inference

Before building AI applications, we need to understand three fundamental concepts:

LLM
↓
Ollama
↓
Inference
https://storage.googleapis.com/lambdabricks-cd393.firebasestorage.app/img_understand_olm_inference.svg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=firebase-adminsdk-fbsvc%40lambdabricks-cd393.iam.gserviceaccount.com%2F20260607%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20260607T171714Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host&X-Goog-Signature=1b01eb3a297864360b4720899802e9ec5b1fbc48c19ea3c9b031129741b95230d750ce1aa1e05e0c2bf00592db7ceb21dbbaa79423d73398ad9695e88bab922473721fb47bfd108d5aec3292e173ca923b24d4bc783624891124e1f558da9f595f23380a444d202054b20d8b5946107711d179461c4fe5d234582a317febf377d54f06e3e60a7694fad51099671df4b0eecc85828e385953210172f6e4f795685422abc69ddfd9415e007069018cc161d16ef899fd1efd1767a7c53ec622361f2d03ddee5321864b650992b6710e4506c5f20e9426bef579515951564c7bbfc55f28ca04e7f2bc8e988afe9796df8d8678d921273dfe721a2a021ee8bf93a67d

What is an LLM?

LLM stands for Large Language Model.

Examples:

  • Llama
  • Phi
  • Mistral
  • A language model predicts the next piece of text.

    Example:

    Input:

    The capital of France is

    Prediction:

    Paris

    Every response from an LLM is generated one token at a time.

    Training vs Inference

    Two terms you’ll hear frequently:

    Training

    The model learns patterns.

    Books
    Code
    Articles
    ↓
    Training
    ↓
    Model

    Inference

    The model answers questions.

    Question
    ↓
    Model
    ↓
    Answer

    As AI application engineers, we mostly perform inference.

    What is Ollama?

    Think of Ollama as a runtime.

    Java
    ↓
    JVM
    
    Python
    ↓
    Interpreter
    
    LLM
    ↓
    Ollama
    Java
    ↓
    JVM
    
    Python
    ↓
    Interpreter
    
    LLM
    ↓
    Ollama

    Ollama loads and runs models on your machine.

    Example:

    ollama run phi3:mini

    Calling a Model

    Once Ollama is running:

    import ollama
    
    response = ollama.chat(
        model="phi3:mini",
        messages=[
            {
                "role": "user",
                "content": "What is Kotlin?"
            }
        ]
    )
    
    print(response["message"]["content"])

    Flow:

    Python
    ↓
    Ollama
    ↓
    Model
    ↓
    Response

    Experiment

    Try:

    What is Android?

    Then:

    Explain Android to a beginner.

    Notice how the model changes its answer based on the input.

    #BuildInPublic#GenerativeAI#LocalLLM#Python#MachineLearning#SoftwareEngineering#TechEducation#ArtificialIntelligence#AIJourney#AIEngineering#Inference#LargeLanguageModels#LearningInPublic#OpenSourceAI#DeveloperTools#Ollama#AIApplications#PromptEngineering#AIAgents#LLM
    LAMBDA BRICKS