Generative AI: Building your own chatbot using open-source LLMs on your PC

Building a chatbot with capabilities comparable to GPT-4 Turbo using public open-source LLMs on a simple PC: Step 1: Set Up Your Development Environment

System Requirements:
- A modern multi-core CPU.
- At least 16GB of RAM (32GB is recommended for better performance).
- A dedicated GPU with at least 6GB of VRAM (NVIDIA GPU with CUDA support is preferred for faster inference).
Install Python: Download and install the latest version of Python from python.org.
Install Required Libraries: Open a terminal or command prompt and install the necessary Python libraries using pip:
```
bash
pip install transformers torch
```

Step 2: Choose and Download an Open-Source LLM

For this guide, we'll use GPT-Neo from EleutherAI due to its balance of performance and accessibility.

Create a Python Script: Open your favorite text editor or IDE (e.g., VSCode, PyCharm) and create a new Python script, say chatbot.py.

Load the Model and Tokenizer: Use the transformers library to load GPT-Neo:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "EleutherAI/gpt-neo-1.3B"  # Adjust this based on your system's capability
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Step 3: Create a Chatbot Interface

Define a Function to Generate Responses: This function takes a user prompt and generates a response using the model.

python
import torch

def generate_response(prompt, model, tokenizer, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(inputs.input_ids, max_length=max_length, pad_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

Implement a Chat Loop: This will allow you to interact with the chatbot in a command-line interface.

python
def chat():
    print("Welcome to the GPT-Neo Chatbot! Type 'exit' to quit.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            break
        response = generate_response(user_input, model, tokenizer)
        print(f"Bot: {response}")

if __name__ == "__main__":
    chat()

Step 4: Optimize Performance

Model Quantization: Quantizing the model can help reduce memory usage and speed up inference. You can use float16 precision:
```
python
model = model.half()
model = model.to("cuda")  # Move model to GPU if available
```
Batch Inference: Grouping multiple inputs together can speed up processing if you plan to handle multiple conversations simultaneously. However, for a single-user chatbot, this might not be necessary.

Step 5: Fine-Tuning the Model (Optional)

If you want the chatbot to perform better in specific domains, you can fine-tune it on a custom dataset. This process requires more computational resources and expertise.

Prepare the Dataset: Collect a dataset of conversations relevant to your domain. The dataset should be in a format compatible with the transformers library.

Fine-Tuning Script: Use the transformers library to fine-tune the model. Here’s a simplified example:

python
from transformers import Trainer, TrainingArguments, TextDataset, DataCollatorForLanguageModeling

def load_dataset(file_path, tokenizer):
    return TextDataset(
        tokenizer=tokenizer,
        file_path=file_path,
        block_size=128,
    )

def fine_tune(model, tokenizer, dataset_path):
    train_dataset = load_dataset(dataset_path, tokenizer)
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,
    )

    training_args = TrainingArguments(
        output_dir="./results",
        overwrite_output_dir=True,
        num_train_epochs=1,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

dataset_path = "path/to/your/dataset.txt"
fine_tune(model, tokenizer, dataset_path)

Step 6: Set Up the Backend with Flask

Install Flask: Open a terminal or command prompt and install Flask:
```
bash
pip install Flask
```

Create Required Files: Ensure your project directory has the following files:

app.py (your Flask application) Create the Backend Script
requirements.txt (Python dependencies)
Procfile (process type declaration for Heroku)

Here’s what these files should contain:

app.py:

python
from flask import Flask, request, jsonify, send_from_directory from transformers import AutoModelForCausalLM, AutoTokenizer app = Flask(__name__) model_name = "EleutherAI/gpt-neo-1.3B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def generate_response(prompt, model, tokenizer, max_length=100): inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(inputs.input_ids, max_length=max_length, pad_token_id=tokenizer.eos_token_id) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response @app.route('/chat', methods=['POST']) def chat(): data = request.json user_input = data.get("message") response = generate_response(user_input, model, tokenizer) return jsonify({"response": response}) @app.route('/') def index(): return send_from_directory('', 'index.html') if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)

Set Up the Frontend with HTML and JavaScript

index.html:

html
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Chatbot</title> <style> body { font-family: Arial, sans-serif; } .chat-container { width: 500px; margin: 0 auto; } .chat-box { border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; } .user-input { width: 100%; padding: 10px; margin-bottom: 10px; } .send-btn { padding: 10px 20px; } </style> </head> <body> <div class="chat-container"> <div class="chat-box" id="chat-box"></div> <input type="text" id="user-input" class="user-input" placeholder="Type your message..."> <button id="send-btn" class="send-btn">Send</button> </div> <script> const sendBtn = document.getElementById('send-btn'); const userInput = document.getElementById('user-input'); const chatBox = document.getElementById('chat-box'); sendBtn.addEventListener('click', async () => { const userMessage = userInput.value; if (!userMessage) return; chatBox.innerHTML += `<p><strong>You:</strong> ${userMessage}</p>`; userInput.value = ''; const response = await fetch('/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: userMessage }) }); const data = await response.json(); chatBox.innerHTML += `<p><strong>Bot:</strong> ${data.response}</p>`; }); </script> </body> </html>

Serve the HTML File with Flask: Modify your app.py to serve the HTML file:

python
from flask import Flask, request, jsonify, send_from_directory app = Flask(__name__) model_name = "EleutherAI/gpt-neo-1.3B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def generate_response(prompt, model, tokenizer, max_length=100): inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(inputs.input_ids, max_length=max_length, pad_token_id=tokenizer.eos_token_id) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response @app.route('/chat', methods=['POST']) def chat(): data = request.json user_input = data.get("message") response = generate_response(user_input, model, tokenizer) return jsonify({"response": response}) @app.route('/') def index(): return send_from_directory('', 'index.html') if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)

requirements.txt:

plaintext
Flask==2.0.3 transformers==4.6.1 torch==1.8.1

Procfile:

plaintext
web: python app.py

Run the Flask App Locally: Run your Flask app to ensure it works locally:
```
bash
python app.py
```
You should be able to send POST requests to http://localhost:5000/chat with a JSON payload containing the message.

Step 7: Host Your Application

Option 1: Using Heroku

Heroku is a cloud platform that simplifies deploying, managing, and scaling applications. Follow these detailed steps to deploy your chatbot on Heroku.

Set Up a Git Repository: Initialize a git repository in your project directory:
```
bash
git init
git add .
git commit -m "Initial commit"
```
Create a Heroku Account:
- Sign up for a free Heroku account at Heroku.
- Install the Heroku CLI following the instructions at Heroku CLI.
Deploy to Heroku:
- Log in to Heroku via the CLI:
```
bash
heroku login
```
- Create a new Heroku app:
```
bash
heroku create
```
- Deploy your application to Heroku:
```
bash
git push heroku master
```
Access Your Chatbot: Open your deployed application in the browser:
```
bash
heroku open
```

Option 2: Using a VPS (e.g., DigitalOcean, AWS EC2)

Set Up a VPS:
- Choose a VPS provider (e.g., DigitalOcean, AWS EC2, Linode) and set up a server.
- Follow the provider’s instructions to create and configure your VPS instance. Choose an OS (e.g., Ubuntu).
SSH into Your Server: Use SSH to connect to your VPS. Replace your_ip_address with your server's IP address:
```
bash
ssh root@your_ip_address
```

Install Dependencies: Update your package list and install Python and pip:

bash
sudo apt update
sudo apt install python3 python3-pip

Install Flask, transformers, and torch:

bash
pip3 install Flask transformers torch

Transfer Your Project Files: Use scp to transfer files from your local machine to the VPS. Replace your_ip_address with your server's IP address and adjust the paths as needed:
```
bash
scp -r /path/to/your/project root@your_ip_address:/path/on/server
```
Run Your Flask App: Navigate to your project directory on the server and run your Flask app:
```
bash
python3 app.py
```
To keep your app running in the background, consider using screen or tmux.

Set Up a Reverse Proxy with Nginx:

Install Nginx:
```
bash
sudo apt install nginx
```

Configure Nginx to forward requests to your Flask app. Create or edit the configuration file in /etc/nginx/sites-available/default:

nginx
server {
    listen 80;
    server_name your_domain_or_ip;

    location / {
        proxy_pass http://127.0.0.1:5000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Test the configuration and restart Nginx:

bash
sudo nginx -t
sudo systemctl restart nginx

(Optional) Secure Your Website with SSL: Use Certbot to obtain a free SSL certificate from Let’s Encrypt:
```
bash
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d your_domain
```

Step 8: Access Your Chatbot

Once your application is hosted and Nginx is configured, you can access your chatbot by visiting your domain or server's IP address in a web browser.

Step 9: Monitoring and Scaling

Set Up Monitoring: Use tools like Prometheus and Grafana to monitor your application’s performance.
- Prometheus:
```
bash
sudo apt-get install -y prometheus
```
- Grafana:
```
bash
sudo apt-get install -y grafana
```
- Configure Prometheus and Grafana to monitor your Flask app.