Building Your First RAG Application with LangChain and ChromaDB in 2026
Building Your First RAG Application with LangChain and ChromaDB in 2026
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import Ollama
# Load and split
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500)
chunks = splitter.split_documents(docs)
# Create vector store
embeddings = HuggingFaceEmbeddings()
db = Chroma.from_documents(chunks, embeddings)
# Query
llm = Ollama(model="llama3")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())
result = qa.run("What is this document about?")
print(result)Retrieval-Augmented Generation (RAG) is one of the most practical AI patterns in 2026. It lets you build AI chatbots that can answer questions using your own data. Here's a step-by-step guide.
What is RAG?
RAG combines a retrieval system (vector database) with a language model. Instead of relying solely on the LLM's training data, it retrieves relevant documents and includes them in the prompt.
Tech stack:
- Python 3.11+
- LangChain (orchestration)
- ChromaDB (vector database)
- OpenAI or Ollama (LLM)
- Sentence Transformers (embeddings)
Step-by-step process:
1. Load your documents (PDF, text, web pages)
2. Split them into smaller chunks
3. Generate embeddings for each chunk
4. Store embeddings in ChromaDB
5. When user asks a question, find similar chunks
6. Pass the retrieved chunks + question to the LLM
7. LLM generates an answer based on the context
Quick code example:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import Ollama
# Load and split
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500)
chunks = splitter.split_documents(docs)
# Create vector store
embeddings = HuggingFaceEmbeddings()
db = Chroma.from_documents(chunks, embeddings)
# Query
llm = Ollama(model="llama3")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())
result = qa.run("What is this document about?")
print(result)