Shaurya Omar - AI/ML Developer

About Me

I'm an undergraduate at IIT Roorkee (Class of '27) with a deep passion for AI/ML and Generative AI. I love turning challenging problems—whether in vision-language, diffusion models, or full-stack web systems—into clean, efficient, and user-centric solutions.

Over the past few years, I've contributed to Hugging Face's nanoVLM repo and built Stable Diffusion and LLaMA 2 completely from scratch. On the web side, I designed and developed a social-style platform using Next.js and MongoDB, complete with threaded posts, threaded comments, likes, follows, and in-app editing.

In the ML space, I've architected end-to-end pipelines: a FastAPI-powered vehicle insurance predictor (OOP design, MongoDB, DVC/MLflow tracking, Dagshub, Docker, AWS S3 model storage) and a water-potability model framework using Cookiecutter scaffolding, DVC/MLflow, Dagshub, CI pipeline, and automated experiment tracking.

My goal is to blend solid engineering rigor with intuitive design—making every project not just functional and scalable, but also a delight to use.

Achievement

JEE Advanced 2023

Secured All India Rank (AIR) 6453 out of 1.5 Million aspirants

Education

Indian Institute of Technology, Roorkee

Bachelor of Technology in Civil Engineering

August 2023 – May 2027

AI/ML Developer with hands-on expertise in generative AI and web systems. Contributed to Hugging Face's nanoVLM and independently built Stable Diffusion and LLaMA 2 from the ground up. Design and deploy end-to-end ML pipelines—leveraging FastAPI, DVC, MLflow, Docker, and AWS S3 and craft user-centric web applications using Next.js and MongoDB.

Experience

AI Research Internship

Trinity College Dublin, Ireland (Remote)

October 2025 – Present

Presented research proposals on multimodality, defining directions for cross-modal representation and fusion
Integrating Mixture-of-Experts (MoE) into Diffusion Transformer (DiT) blocks
Replacing the dense feed-forward sublayer to increase model capacity with sparse computation
Implementing expert routing and gating mechanisms for stable MoE fine-tuning in diffusion models

Generative AI Internship

Predis.ai (Remote)

August 2025

Surveyed recent research on controllable image generation to drive design choices for ad-creative systems
Evaluated and prototyped Qwen-Image to improve prompt + image conditioning for targeted outputs
Built conditioned ad-creative pipeline with FLUX & OmniControl for catalog to image + copy generation
Applied LoRA and staged prompts to improve controllability and brand alignment

Projects

Stable Diffusion from Scratch

Complete implementation of Stable Diffusion with VAE encoder/decoder, CLIP text encoder, UNet with cross-attention, and classifier-free guidance. Generated 512×512 images from text prompts using DDPM denoising.

PyTorch CLIP VAE UNet DDPM

View Project

LLaMA 2 Implementation

Built LLaMA 2 from scratch with KV-Cache, rotary embeddings, and top-p inference strategy. Implemented BPE tokenizer, attention mechanisms, and ran zero-shot text generation on custom prompts.

PyTorch Transformers BPE KV-Cache

View Project

GPT from Scratch

Implemented transformer-based language model with multi-head self-attention, feed-forward layers, and autoregressive generation. Trained with AdamW optimizer achieving val loss of ~1.89.

PyTorch Transformers AdamW Self-Attention

View Project

Vehicle Insurance Prediction

End-to-end MLOps pipeline with binary classifier for customer interest prediction. Integrated MongoDB Atlas, Docker containerization, AWS S3/ECR, and automated CI/CD via GitHub Actions.

FastAPI MongoDB Docker AWS MLflow

View Project

Transformer Implementation

Built complete Transformer architecture with token/position embeddings, multi-head attention, and training pipeline. Integrated TensorBoard logging and automated checkpointing.

PyTorch Attention TensorBoard

View Project

Hugging Face Contribution

Open source contribution to enhance configuration of nanoVLM repository. Improved model architecture and configuration management for vision-language models.

Hugging Face Vision-Language Open Source

View Project

Technical Skills

Languages & Frameworks

Python C++ SQL PyTorch Hugging Face

MLOps & Tools

AWS S3/EC2 Docker MLflow DVC Git/GitHub CI/CD

Data Science

NumPy Pandas Scikit-Learn XGBoost LightGBM

Deep Learning

Neural Networks CNNs RNNs Transformers

NLP

Transformers BERT LoRA/PEFT LLMs RAG

Computer Vision

CLIP Vision Transformers Stable Diffusion Multimodal Models FLUX

Databases

MongoDB MySQL PostgreSQL

Advanced Concepts

Chain of Thought Tool Calling Agentic Flow MoE

Get In Touch

Feel free to reach out for collaborations, research opportunities, or just to discuss AI/ML!