CodebaseQA
Open-source AI platform for codebase onboarding that indexes GitHub repositories, answers natural-language questions with source-cited responses, generates persona-based learning tracks, and visualizes full-workspace dependency graphs.
Problem
Understanding an unfamiliar repository is still a slow, fragmented workflow. Developers bounce between READMEs, grep, docs, architecture guesses, and tribal knowledge just to answer basic questions about system flow, ownership boundaries, and where to start contributing.
Approach
Built CodebaseQA as a pnpm/Turbo monorepo with a Next.js 16 frontend and FastAPI backend. The platform clones and indexes repositories, parses code with Tree-sitter across 9 languages, stores embeddings in Chroma, and serves SSE-based chat with hybrid retrieval, reranking, and source citations. On top of Q&A, I shipped a persona-based learning engine with AI-generated lessons, quizzes, challenges, and a deterministic dependency graph explorer powered by React Flow with ELK primary layout and Dagre fallback. The system is productionized with Docker, GitHub Actions CI, Vercel frontend hosting, Render backend deployment, and Redis-backed caching/rate-limit fallbacks.
Impact
Turned codebase onboarding into an interactive product instead of a documentation hunt. CodebaseQA combines chat, search, guided learning, graph exploration, gamification, and CLI workflows so developers can move from 'What does this repo do?' to hands-on understanding much faster on large, multi-language repositories.
Key Metrics
Technologies
Links
My Role
Sole developer and architect - designed the full monorepo, built the repository indexing and RAG pipeline, implemented the learning/gamification systems, shipped the dependency graph explorer, wired Docker and CI/CD, and deployed the frontend/backend split to production.
Team Size: 1 person