Overview of DocuMancer AI
DocuMancer AI is an open-source AI chatbot that reads your Markdown (.md) files and generates response using RAG (Retrieval-Augmented Generation) technique and Kubernetes documentation via GitHub.
Whether you're building an internal support tool, documentation assistant, or a smart chatbot for your team, this project shows you how to build one from scratch using your own docs and modern AI tools.
How DocuMancer AI Works using RAG
RAG combines two powerful techniques:
- Retrieval: Searches for the most relevant documents from your Markdown (.md) files.
- Generation: Passes that information to an LLM (like GPT) to generate an accurate answer.
Instead of training a new model, plug Kubernetes GitHub docs into GPT, helping it answer smarter.
Tools and Technologies
LangChain
to orchestrate the RAG workflowFastAPI
for building a backend APIReact
to create a friendly frontend UIAzure OpenAI
-GPT-4o model
or any LLM provider to answer questionsAzure Opena AI
-text-embedding-3-small
for text embeddingsFAISS
vector database is to store and search your document chunks efficiently
Developing DocuMancer AI
Clone the repository and rag_chatbot_k8
folder is your DocuMancer AI project.
The folder structure of this project
rag_chatbot_k8
|__ frontend
|__ main_backend
|__ sync_backend
|__ vector_store
|__k8s/ # for heml charts
|__ Readme.md
There are three services,
main_backend
is the query processing logic, where it accepts user query from frontend and sends the query to vector-store.vector_store
is the vector database service which converts query to embedding. Also searches the relevant document using similarity search. FAISS is used as vector database for this project.sync_backend
is the cronjob service rather than API service. This is batch process that schedules for week or month. It clones and copies the Kubernetes GitHub docs, does text embeddings for those documents and save those embeddings in vector store.
The frontend is the UI part which is coded in React
.