About Me
I'm an undergraduate at IIT Roorkee (Class of '27) with a deep passion for AI/ML and Generative AI. I love turning challenging problems—whether in vision-language, diffusion models, or full-stack web systems—into clean, efficient, and user-centric solutions.
Over the past few years, I've contributed to Hugging Face's nanoVLM repo and built Stable Diffusion and LLaMA 2 completely from scratch. On the web side, I designed and developed a social-style platform using Next.js and MongoDB, complete with threaded posts, threaded comments, likes, follows, and in-app editing.
In the ML space, I've architected end-to-end pipelines: a FastAPI-powered vehicle insurance predictor (OOP design, MongoDB, DVC/MLflow tracking, Dagshub, Docker, AWS S3 model storage) and a water-potability model framework using Cookiecutter scaffolding, DVC/MLflow, Dagshub, CI pipeline, and automated experiment tracking.
My goal is to blend solid engineering rigor with intuitive design—making every project not just functional and scalable, but also a delight to use.
Achievement
JEE Advanced 2023
Secured All India Rank (AIR) 6453 out of 1.5 Million aspirants
Education
Indian Institute of Technology, Roorkee
Bachelor of Technology in Civil Engineering
August 2023 – May 2027
AI/ML Developer with hands-on expertise in generative AI and web systems. Contributed to Hugging Face's nanoVLM and independently built Stable Diffusion and LLaMA 2 from the ground up. Design and deploy end-to-end ML pipelines—leveraging FastAPI, DVC, MLflow, Docker, and AWS S3 and craft user-centric web applications using Next.js and MongoDB.
Experience
AI Research Internship
Trinity College Dublin, Ireland (Remote)
October 2025 – Present
- Presented research proposals on multimodality, defining directions for cross-modal representation and fusion
- Integrating Mixture-of-Experts (MoE) into Diffusion Transformer (DiT) blocks
- Replacing the dense feed-forward sublayer to increase model capacity with sparse computation
- Implementing expert routing and gating mechanisms for stable MoE fine-tuning in diffusion models
Generative AI Internship
Predis.ai (Remote)
August 2025
- Surveyed recent research on controllable image generation to drive design choices for ad-creative systems
- Evaluated and prototyped Qwen-Image to improve prompt + image conditioning for targeted outputs
- Built conditioned ad-creative pipeline with FLUX & OmniControl for catalog to image + copy generation
- Applied LoRA and staged prompts to improve controllability and brand alignment
Projects
Stable Diffusion from Scratch
Complete implementation of Stable Diffusion with VAE encoder/decoder, CLIP text encoder, UNet with cross-attention, and classifier-free guidance. Generated 512×512 images from text prompts using DDPM denoising.
LLaMA 2 Implementation
Built LLaMA 2 from scratch with KV-Cache, rotary embeddings, and top-p inference strategy. Implemented BPE tokenizer, attention mechanisms, and ran zero-shot text generation on custom prompts.
GPT from Scratch
Implemented transformer-based language model with multi-head self-attention, feed-forward layers, and autoregressive generation. Trained with AdamW optimizer achieving val loss of ~1.89.
Vehicle Insurance Prediction
End-to-end MLOps pipeline with binary classifier for customer interest prediction. Integrated MongoDB Atlas, Docker containerization, AWS S3/ECR, and automated CI/CD via GitHub Actions.
Transformer Implementation
Built complete Transformer architecture with token/position embeddings, multi-head attention, and training pipeline. Integrated TensorBoard logging and automated checkpointing.
Hugging Face Contribution
Open source contribution to enhance configuration of nanoVLM repository. Improved model architecture and configuration management for vision-language models.
Technical Skills
Languages & Frameworks
MLOps & Tools
Data Science
Deep Learning
NLP
Computer Vision
Databases
Advanced Concepts
Get In Touch
Feel free to reach out for collaborations, research opportunities, or just to discuss AI/ML!