This guide walks you through every command and concept needed to build, from scratch and in plain English, a fully offline AI assistant using the Mistral 7B model that's production‑ready, secured in a sandbox, and tailored entirely to your company’s internal knowledge.



You’ll:

  1. install Python, 
  2. create an isolated environment, 
  3. set up the Ollama runtime, 
  4. download the Mistral model under its Apache 2.0 license, 
  5. ingest your documents into text chunks, 
  6. generate embeddings with a sentence-transformer, 
  7. store them in a vector database (FAISS, Chroma, or Qdrant), 
  8. wire everything together in a Retrieval‑Augmented Generation (RAG) pipeline using LangChain,
  9. build a simple Streamlit web interface, 
  10. containerize the whole app with Docker, and 
  11. deploy it securely behind your firewall.



1. Install Python

First, you need the Python programming language on your computer:

  1. Download Python:
    Go to the official downloads page and grab the installer for your operating system (Windows, macOS, or Linux). Python.org

  2. Run the installer:

    • Windows/macOS: Launch the downloaded installer and follow the prompts.

    • Linux: You can install Python from your distribution’s package manager or compile from source.

  3. Verify installation:
    Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type:

    bash
    python --version

    You should see something like Python 3.13.3, which is the current stable release. Python.org


2. Create & Activate a Virtual Environment

Keeping dependencies isolated prevents conflicts with other software:

  1. Create a project folder:

    bash
    mkdir my_ai_project cd my_ai_project
  2. Make a virtual environment using Python’s built‑in venv module:

    bash
    python -m venv venv

    This creates a new folder venv/ containing its own Python interpreter and libraries. Python documentation

  3. Activate the environment:

    • Windows:

      bash
      venv\Scripts\activate
    • macOS/Linux:

      bash
      source venv/bin/activate

    After activation, your prompt will show (venv) to indicate that you’re working inside this isolated environment. Python documentation


3. Install & Run Ollama (Sandboxed LLM Runtime)

Ollama provides a local CLI to host and interact with open‑source models offline:

  1. Install Ollama by running their installer script:

    bash
    curl -fsSL https://ollama.com/install.sh | sh

    This script detects your operating system and architecture, then installs the correct Ollama binary. Ollama

  2. Start the Ollama service (runs in the background):

    bash
    ollama serve

    This command launches a local server that can load and run models without internet access. GitHub

  3. Check your installation:

    bash
    ollama -v

    You should see the Ollama version printed, confirming it’s ready.


4. Pull & Test the Mistral Model

Mistral 7B is an Apache 2.0‑licensed model—no restrictions on military or commercial use:

  1. Download Mistral 7B via Ollama:

    bash
    ollama pull mistral

    Mistral is a 7.3 billion‑parameter model released under Apache 2.0, freely usable without restrictions. OllamaMistral AI | Frontier AI in your hands

  2. Run a quick test:

    bash
    ollama run mistral "Hello, Mistral!"

    You should see the model generate a completion for your prompt, confirming it works locally. Ollama


5. Ingest Documents & Generate Embeddings

To teach the AI your private data, you’ll convert documents into searchable vectors:

  1. Install required Python libraries:

    bash
    pip install sentence-transformers langchain
  2. Load the embedding model in a Python script:

    python
    from sentence_transformers import SentenceTransformer model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

    The all-MiniLM-L6-v2 model maps text to 384-dimensional vectors for semantic search. Hugging Face

  3. Chunk your documents (e.g., split PDFs or text files into 500‑token pieces) and run:

    python
    embeddings = model.encode(text_chunks)

    This produces one vector per chunk, ready for indexing.


6. Set Up a Local Vector Database

Store your embeddings so you can quickly find relevant text at query time:

  1. FAISS (Facebook AI Similarity Search)
    Install the CPU‑only package:

    bash
    pip install faiss-cpu

    FAISS can handle up to billions of vectors efficiently on a single machine. PyPI

  2. Chroma (Apache 2.0 licensed)
    A lightweight embedding database with a simple Python client:

    bash
    pip install chromadb

    Chroma makes it easy to spin up an embedding store in minutes. PyPI

  3. Qdrant (Rust‑based, Docker‑friendly)
    Pull and run the Docker container:

    bash
    docker pull qdrant/qdrant docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

    Qdrant offers filtering and payload storage alongside vector search. Qdrant - Vector Database


7. Build the Retrieval‑Augmented Generation (RAG) Pipeline

Combine embeddings search with the Mistral model to answer queries:

  1. Install LangChain:

    bash
    pip install langchain

    LangChain provides abstractions for embeddings, vectorstores, and LLM chaining. Introduction | 🦜️🔗 LangChain

  2. Wire it together (example with FAISS):

    python
    from langchain.embeddings import SentenceTransformerEmbeddings from langchain.vectorstores import FAISS from langchain.llms import Ollama from langchain import PromptTemplate, LLMChain embedder = SentenceTransformerEmbeddings('sentence-transformers/all-MiniLM-L6-v2') db = FAISS.from_texts(text_chunks, embedder) retriever = db.as_retriever() llm = Ollama(model="mistral") template = PromptTemplate.from_template( "Use the context below to answer the question.\n\nContext:\n{context}\n\nQuestion: {question}" ) chain = LLMChain(llm=llm, prompt=template) def answer(question: str) -> str: docs = retriever.get_relevant_documents(question) context = "\n\n".join(d.page_content for d in docs) return chain.run(context=context, question=question)

    This function retrieves your top-k chunks, feeds them as context, and returns Mistral’s answer.


8. Create a Simple Streamlit Web Interface

Let non-technical users ask questions through a browser page:

  1. Install Streamlit:

    bash
    pip install streamlit

    Streamlit turns Python scripts into interactive web apps with minimal effort. Streamlit Docs

  2. Write app.py:

    python
    import streamlit as st from your_rag_module import answer st.title("Company AI Assistant") question = st.text_input("Ask a question about our documents:") if st.button("Submit"): response = answer(question) st.write(response)
  3. Launch the app:

    bash
    streamlit run app.py

    Your browser will open at http://localhost:8501, showing the chat interface. Streamlit Docs


9. Containerize & Deploy with Docker

Package your entire setup so it runs reliably anywhere:

  1. Install Docker on Ubuntu (example):

    bash
    # Set up Docker’s official apt repository sudo apt-get update sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \ https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io

    Docker Documentation

  2. Enable non‑root Docker usage:

    bash
    sudo usermod -aG docker $USER sudo reboot

    After reboot, you can run Docker commands without sudo. GeeksforGeeks

  3. Create a Dockerfile in your project:

    dockerfile
    FROM python:3.13-slim WORKDIR /app COPY venv/ venv/ COPY . . ENV PATH="/app/venv/bin:$PATH" EXPOSE 8501 CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
  4. Build & run:

    bash
    docker build -t private-ai . docker run -d -p 8501:8501 private-ai

    Your app is now reachable at http://<server-ip>:8501 in any browser.


10. Secure & Maintain Your Deployment

  • Run everything air‑gapped behind your corporate VPN or firewall.

  • Implement role‑based access control (RBAC) or basic auth in front of Streamlit.

  • Log queries and responses for auditing and to improve your data ingestion pipeline.

  • Automate updates: schedule re-ingestion of new documents and re-indexing of embeddings.


Powered by Blogger.