AI 101: How to build your own open source sandboxed chatbot

This guide walks you through every command and concept needed to build, from scratch and in plain English, a fully offline AI assistant using the Mistral 7B model that's production‑ready, secured in a sandbox, and tailored entirely to your company’s internal knowledge.

You’ll:

install Python,
create an isolated environment,
set up the Ollama runtime,
download the Mistral model under its Apache 2.0 license,
ingest your documents into text chunks,
generate embeddings with a sentence-transformer,
store them in a vector database (FAISS, Chroma, or Qdrant),
wire everything together in a Retrieval‑Augmented Generation (RAG) pipeline using LangChain,
build a simple Streamlit web interface,
containerize the whole app with Docker, and
deploy it securely behind your firewall.

1. Install Python

First, you need the Python programming language on your computer:

Download Python:
Go to the official downloads page and grab the installer for your operating system (Windows, macOS, or Linux). Python.org
Run the installer:
- Windows/macOS: Launch the downloaded installer and follow the prompts.
- Linux: You can install Python from your distribution’s package manager or compile from source.
Verify installation:
Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type:
```
bash
python --version
```
You should see something like Python 3.13.3, which is the current stable release. Python.org

2. Create & Activate a Virtual Environment

Keeping dependencies isolated prevents conflicts with other software:

Create a project folder:

bash
mkdir my_ai_project
cd my_ai_project

Make a virtual environment using Python’s built‑in venv module:
```
bash
python -m venv venv
```
This creates a new folder venv/ containing its own Python interpreter and libraries. Python documentation
Activate the environment:
- Windows:
```
bash
venv\Scripts\activate
```
- macOS/Linux:
```
bash
source venv/bin/activate
```
After activation, your prompt will show (venv) to indicate that you’re working inside this isolated environment. Python documentation

3. Install & Run Ollama (Sandboxed LLM Runtime)

Ollama provides a local CLI to host and interact with open‑source models offline:

Install Ollama by running their installer script:
```
bash
curl -fsSL https://ollama.com/install.sh | sh
```
This script detects your operating system and architecture, then installs the correct Ollama binary. Ollama
Start the Ollama service (runs in the background):
```
bash
ollama serve
```
This command launches a local server that can load and run models without internet access. GitHub
Check your installation:
```
bash
ollama -v
```
You should see the Ollama version printed, confirming it’s ready.

4. Pull & Test the Mistral Model

Mistral 7B is an Apache 2.0‑licensed model—no restrictions on military or commercial use:

Download Mistral 7B via Ollama:
```
bash
ollama pull mistral
```
Mistral is a 7.3 billion‑parameter model released under Apache 2.0, freely usable without restrictions. OllamaMistral AI | Frontier AI in your hands
Run a quick test:
```
bash
ollama run mistral "Hello, Mistral!"
```
You should see the model generate a completion for your prompt, confirming it works locally. Ollama

5. Ingest Documents & Generate Embeddings

To teach the AI your private data, you’ll convert documents into searchable vectors:

Install required Python libraries:

bash
pip install sentence-transformers langchain

Load the embedding model in a Python script:

python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

The all-MiniLM-L6-v2 model maps text to 384-dimensional vectors for semantic search. Hugging Face

Chunk your documents (e.g., split PDFs or text files into 500‑token pieces) and run:
```
python
embeddings = model.encode(text_chunks)
```
This produces one vector per chunk, ready for indexing.

6. Set Up a Local Vector Database

Store your embeddings so you can quickly find relevant text at query time:

FAISS (Facebook AI Similarity Search)
Install the CPU‑only package:
```
bash
pip install faiss-cpu
```
FAISS can handle up to billions of vectors efficiently on a single machine. PyPI
Chroma (Apache 2.0 licensed)
A lightweight embedding database with a simple Python client:
```
bash
pip install chromadb
```
Chroma makes it easy to spin up an embedding store in minutes. PyPI
Qdrant (Rust‑based, Docker‑friendly)
Pull and run the Docker container:
```
bash
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
```
Qdrant offers filtering and payload storage alongside vector search. Qdrant - Vector Database

7. Build the Retrieval‑Augmented Generation (RAG) Pipeline

Combine embeddings search with the Mistral model to answer queries:

Install LangChain:
```
bash
pip install langchain
```
LangChain provides abstractions for embeddings, vectorstores, and LLM chaining. Introduction | 🦜️🔗 LangChain

Wire it together (example with FAISS):

python
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import Ollama
from langchain import PromptTemplate, LLMChain

embedder = SentenceTransformerEmbeddings('sentence-transformers/all-MiniLM-L6-v2')
db = FAISS.from_texts(text_chunks, embedder)
retriever = db.as_retriever()
llm = Ollama(model="mistral")
template = PromptTemplate.from_template(
    "Use the context below to answer the question.\n\nContext:\n{context}\n\nQuestion: {question}"
)
chain = LLMChain(llm=llm, prompt=template)

def answer(question: str) -> str:
    docs = retriever.get_relevant_documents(question)
    context = "\n\n".join(d.page_content for d in docs)
    return chain.run(context=context, question=question)

This function retrieves your top-k chunks, feeds them as context, and returns Mistral’s answer.

8. Create a Simple Streamlit Web Interface

Let non-technical users ask questions through a browser page:

Install Streamlit:
```
bash
pip install streamlit
```
Streamlit turns Python scripts into interactive web apps with minimal effort. Streamlit Docs

Write app.py:

python
import streamlit as st
from your_rag_module import answer

st.title("Company AI Assistant")
question = st.text_input("Ask a question about our documents:")
if st.button("Submit"):
    response = answer(question)
    st.write(response)

Launch the app:
```
bash
streamlit run app.py
```
Your browser will open at http://localhost:8501, showing the chat interface. Streamlit Docs

9. Containerize & Deploy with Docker

Package your entire setup so it runs reliably anywhere:

Install Docker on Ubuntu (example):

bash
# Set up Docker’s official apt repository
sudo apt-get update
sudo apt-get install \
  ca-certificates \
  curl \
  gnupg \
  lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Docker Documentation

Enable non‑root Docker usage:
```
bash
sudo usermod -aG docker $USER
sudo reboot
```
After reboot, you can run Docker commands without sudo. GeeksforGeeks

Create a Dockerfile in your project:

dockerfile
FROM python:3.13-slim
WORKDIR /app
COPY venv/ venv/
COPY . .
ENV PATH="/app/venv/bin:$PATH"
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Build & run:

bash
docker build -t private-ai .
docker run -d -p 8501:8501 private-ai

Your app is now reachable at http://<server-ip>:8501 in any browser.

10. Secure & Maintain Your Deployment

Run everything air‑gapped behind your corporate VPN or firewall.
Implement role‑based access control (RBAC) or basic auth in front of Streamlit.
Log queries and responses for auditing and to improve your data ingestion pipeline.
Automate updates: schedule re-ingestion of new documents and re-indexing of embeddings.