Repository Explainer leverages a multi-layered RAG framework with an AI agent to address the limitations of large language models when processing entire codebases. It features an intelligent dual-granularity indexing system, both at file-level and at logical-unit level, preserving both the context and the precision. Logical units are extracted using tree-sitter based utilities to enable fine-grained sementic analysis of the codebase. The system can generate high-level summaries of a repository’s architecture, data flow & content and also answer developer queries in natural language by retrieving and integrating relevant code context. Additionally, an AI agent is integrated that can fetch specific files or functions on demand.
Technology Stack: LangChain for orchestration, Google Gemini API for cloud LLM model, Nomic Embed & MiniLMvminilm-l12-v2 for local embeddings, ChromaDB for vector storage, Tree-sitter for logical unit extraction, GitHub API for code fetching, Django for backend and Gunicorn/Nginx for deployment.
View Repository - Project Report - Live Site
Dormitory is a community-driven app designed to connect passionate students, educators, enthusiasts and geeks, fostering the advancement of a collective knowledge base. I designed the core pipeline for semantic search across the paltform using content embeddings and vector store, created the dormitory-kitten bot to automatically scrape and post opportunities from external sources. To provide personalized guidance, I orchestrated a student-aware, RAG-oriented chatbot that pulls insights from the platform’s knowledge base and can remember past conversations.
Technology Stack: Django, Django REST API, Langchain, ChromaDB, Gemini API.
View Repository - Backend API Endpoints for Testing
We researched and developed a feasible end-user solution for static malware analysis and detection from Portable Executable (PE) headers data using multiple machine learning algorithms. We also compared & contrasted our approach with pre-existing literatures and end-user solutions.
The project includes the entire pipeline- from data preprocessing, feature extraction, feature selection to model training, evaluation and visualization.
Technology Stack: Python, Jupyter Notebook, Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn, VirusTotal API, Google Cloud.
The repository is currently private will be made public soon. However, I am open to sharing it now upon request.
This is a collaborative time-tracking web application designed to facilitate efficient time management and productivity for users. It allows users to track their time spent on different tasks, get detailed reports of their time usage and see their peer’s online status real-time using WebSockets.
Technology Stack: Python, Django, Django REST API, Django-Channels, React, Shadcn, Tailwind, SQLite, Redis, Websocket and Daphne async server. View Repository
Python-backed web application to solve non-homogeneous ordinary differential equations (ODEs) utilizing Python libraries like Pandas & NumPy.
Technology Stack: Python, Django, Pandas, NumPy, HTML & CSS. Other Contributors: Kefaet Ullah.
View Repository
Personal portfolio website showcasing my projects, skills and write-ups.
Technology Stack: Hugo, Github Pages, Twitter API. View Repository