How To Train your AI Part 2

7/6/2025

How To Train your AI Part 2

by: Trey Clark

Missed part one? Here ya go!

A Deeper Dive into RAG

So, what is RAG? In short: Retrieval-Augmented Generation is the cheat code to making LLMs feel smart without blowing through compute or fine-tuning budgets.

Where traditional LLMs generate answers from internal patterns based on training data, RAG models dynamically fetch relevant documents from an external source, think PDFs, knowledge bases, or even raw text files. Then use that context to generate more accurate and relevant responses. This approach not only makes your model feel more intelligent, but it also allows it to remain current without requiring massive retraining cycles.

It’s like handing your AI a reference sheet before a test. Suddenly, it's not guessing anymore! It’s actually citing. For my project, this meant feeding it hand-picked JSON files that I had scraped and formatted myself, and watching the bot evolve from generic responses to “Fire Blast cannot be learned by a Squirtle!”.

Using WebUI as a Basis

I didn't have the time or resources ($$$) to start training an LLM from scratch. Enter Open WebUI, an open-source, self-hosted ChatGPT-style interface that provides a clean way to interact with LLMs like Gemma, LLaMA, and others via the Ollama backend. This gave me GUI and API support, chat history, and markdown rendering which was everything I needed to build a usable frontend without spinning up my own React app.

WebUI also supports RAG out of the box through file ingestion and embedding configs. Upload a directory of markdowns, PDFs, or JSON files, and the vector store gets updated behind the scenes.

My workflow looked like this:

- Spin up WebUI with RAG enabled. This included custom configuration, traffic management, and local network redirects.

- Upload curated datasets I scraped using BeautifulSoup. (Over 1 millions lines of data!!)

- Ask Pokemon related questions since that's what the data is trained on. These were also passed in as data sets.

- Iterate on embedding strategies within the datasets (e.g., chunk size, overlap, etc.).

Within minutes, I had a fully functioning chat interface that could answer contextually informed questions about Pokemon related information.

Things to Consider

First, your data matters. If you're feeding the model unclear or messy info, don’t expect the results you want. RAG is smart, but it’s only as good as what it retrieves. Clean, structured, and relevant content will always give better answers.

Second, watch where you get your data. Just because something is online doesn’t mean it’s free to use. If you’re scraping a site, make sure you’re not breaking any terms of service or overloading their servers. For this project, I was respectful to Serebii. One crawler and one request per page.

Third, compute isn’t free. Embedding hundreds of documents and running a model (especially on a GPU) adds up. For small-scale stuff, you’re probably fine. But if you’re planning to go big, expect to manage resource limits, costs, and performance tuning. Unless you're doing it locally.

And finally, hallucinations still happen. Even with the right info in the index, sometimes the model just… makes things up. It helps to return sources when possible or design your UI in a way that encourages users to fact-check.

This was my first serious hands-on project using RAG, and it didn’t disappoint. There’s something powerful about seeing an LLM answer a question with data you just gave it. It transforms the AI from a black box to a useful collaborator.

A few key takeaways:

This project may have started as an experiment, but it's opened the door to more domain-specific AI tools that are grounded, contextual, and useful. Whether you’re building a chatbot for Pokémon trivia, legal documents, or internal company policy, RAG gives your LLM a memory boost without breaking the bank.