AI Engineer Portfolio — Prince Singh | LLM, RAG, Full Stack, Cloud, DevOps

AI/ML Engineer | LLM Engineer | RAG Developer | Full-Stack Engineer | Founding Engineer | Cloud & DevOps Engineer

Prince Singh is an AI Engineer, Full-Stack Developer, and Founding Engineer with expertise in modern AI systems, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), LangChain, vector databases, multi-agent systems, cloud computing, DevOps, scalable architectures, and end-to-end product engineering. His portfolio represents real-world engineering experience across AI, ML, full-stack development, and high-performance web applications.

Large Language Model (LLM) Engineering

Prince builds advanced LLM workflows including custom prompts, embeddings, hybrid search, token optimization, context building, and production-grade inference systems. He works with OpenAI, GPT models, LangChain, and vector stores to create intelligent and scalable AI applications.

RAG Pipeline Engineering

Expertise includes document chunking, embeddings generation, semantic search, ChromaDB, Pinecone, context ranking, vector search optimization, and end-to-end RAG pipelines used in production environments with low latency and high accuracy.

Agentic AI & Multi-Agent Systems

Designs autonomous agents capable of tool calling, reasoning, planning, workflow execution, code generation, debugging, research automation, and contextual problem solving powered by multi-step LLM reasoning and memory components.

AI Product Engineering

Prince has built AI-powered platforms like RoadmapAI, AskAI, CodeLLM, contextual AI code editors, peer-to-peer AI tools, and intelligent coding assistants that combine full-stack engineering with advanced LLM capabilities.

Full-Stack Engineering (React, Next.js, Node.js, TypeScript)

Skilled in Next.js, React.js, Node.js, Express.js, TypeScript, MongoDB, PostgreSQL, Redis, REST APIs, GraphQL, WebSockets, authentication systems, SSR/ISR, and building responsive and scalable frontend and backend applications.

Cloud Engineering & DevOps

Hands-on experience with AWS, Docker, Kubernetes, CI/CD pipelines, GitHub Actions, EC2, S3, load balancing, scaling APIs, containerization, microservices, observability, and high-performance deployments optimized for millions of requests.

System Design & Architecture

Expertise in designing scalable distributed systems, real-time architectures, caching layers, pub/sub messaging, event-driven systems, serverless functions, and fault-tolerant engineering used in modern SaaS products.

Competitive Programming & DSA

Solved 5000+ DSA problems across LeetCode, GFG, CodeStudio, InterviewBit, and HackerEarth. Strong foundation in algorithms, data structures, problem solving, and coding interviews. Ranked in top competitive programming brackets with global achievements.

Founding Engineer Experience

Experienced as a Founding Engineer owning end-to-end product development, architecture, feature planning, user-facing engineering, backend optimization, LLM integrations, cloud deployments, reliability engineering, and building products at startup speed.

Remote AI Engineer | Global Collaboration

Proven track record working with international teams, remote-first startups, and cross-timezone engineering environments. Experienced in delivering scalable and clean engineering solutions in distributed teams.

Software Engineer Portfolio

This portfolio reflects expertise in frontend engineering, backend API development, AI systems, cloud pipelines, microservices, scalable infrastructures, and high-quality modern applications designed with user-first engineering.

Developer Portfolio

Explore work spanning AI engineering, machine learning projects, full-stack applications, intelligent tools, design systems, SaaS products, open source contributions, and real-world production code used by thousands of users.

Hello 👋

Prince Singh | Founding Engineerverified

Looking for Switch
Interview Ready

Founding Engineer & AI Architect @ProPeers | Ex-SDE @CloudConduction | Architecting Agentic AI & LLM Systems | Multi-Model LLM Orchestration | RAG Pipelines & MCP | GenAI & Fine-Tuning | AI Systems & Architecture | LangChain, Vector DBs (ChromaDB) | Vector Search | MERN & Full-Stack Engineering | Scalable Infra (AWS, Azure) | DevOps & AIOps | System Design & DSA | 600K+ Users | Mentoring 40K+ Engineers | LeetCode Knight 👑 | GFG Inst. Rank 1 🥇 | InterviewBit Global 13 🥇 | CodeStudio Specialist 🌞

Full-Stack • System Design • Cloud & DevOps • Microservices • High-Performance APIs • Scalable Infrastructure

I’ve engineered the core of our AI ecosystem Multi LLLM's Orchestration , RoadmapAI, CodeLLM, AskAI, AI Code Editor, and the Global AI Search, designing end-to-end Agentic AI pipelines with RAG-driven personalization, MCP-layered orchestration, and multi-model LLM architectures that deliver real-time learning guidance, deterministic code evaluation, and deeply context-aware programming assistance at scale.
I’ve worked hands-on with leading LLM MODELS and AI platforms including OpenAI, Google AI, Anthropic (Claude), MistralAI, Meta, Grok, Moonshot and Databricks, implementing intelligent model routing, fallback strategies, cost-aware inference, and latency-optimized multi-provider execution.
The system leverages Vector Databases (ChromaDB) for semantic context retrieval and long-term memory, enabling high-precision RAG workflows. I’ve implemented token-level streaming responses to deliver real-time AI output, along with response caching, embedding reuse, and prompt-result memoization to significantly reduce latency, repeated inference, and overall token costs.
My work spans LLM System Design, tokenization & reasoning flows, streaming & tool-calling agents, vectorized context pipelines, and high-availability AI microservices, forming the intelligence backbone of the platform.

About

I'm a Founding Engineer & AI Architect with 2.5 years of hands-on experience building large-scale AI systems, distributed backend infrastructures, and production-grade full-stack platforms. At ProPeers, I own and engineer the core systems that power 80%+ of total platform traffic including Roadmaps, RoadmapAI, AskAI, CodeLLM, Global AI Search, and the Contextual AI Code Editor.
I design high-scale backend architectures, real-time data pipelines, aggregation engines for 100K+ users, Redis-backed caching layers, search-validation systems, role-based access flows, rate-limiting frameworks, and CI/CD deployment automation that cut release time by 34% and improved reliability across 150+ microservices. Through SSR, dynamic imports and hybrid rendering patterns, I’ve reduced key user journey response times from 1.1s → 200ms, delivering a noticeably smoother product experience.
As an AI Architect, I build Agentic AI pipelines, RAG retrieval systems, MCP protocol layers, and multi-model inference workflows using Azure OpenAI, Azure Databricks, GPT models, and Llama 3.x OSS models. My work spans token optimization, context-window compression, semantic chunking, and adaptive prompt engineering to deliver intelligent experiences at <1s latency under real production traffic.
I’ve engineered RoadmapAI with a self-learning RAG pipeline (text-embedding-ada-002, ChromaDB, semantic filters, vector enrichment), achieving ~99% roadmap accuracy and lifting roadmap ratings from the early 12% baseline. I built CodeLLM, a production AI judge featuring multi-language detection, dual-layer JSON parsing, COMPILATION/RUNTIME/VALIDATION error classification, and deterministic verdict synthesis for educational code evaluation.
I developed AskAI with MCP-layered prompts, resource-type detection (roadmap/article/practice), O1/O3 model routing, token metering, and auto-structured responses improving resolution speed and engagement . I also built the AI Code Editor with ~40ms inference, inline reasoning, multi-language execution, and deep integration with RoadmapAI and CodeLLM, significantly boosting editor retention.
Beyond AI flows, I’ve implemented token-based tiered access systems (one-time/monthly/yearly) on top of these capabilities, and engineered self-optimizing RAG pipelines and distributed multi-model inference workflows that balance accuracy, cost and latency under real-world traffic.
On the product and platform side, I’ve delivered Individual Roadmap Communities, scalable live-stream pipelines, error-resilient API layers, multi-step onboarding flows, connected roadmap progress engines, and search validation systems ensuring hallucination-free retrieval across Roadmaps and RoadmapAI.
At the infrastructure layer, I’ve reduced downtime by 90% (4 hours → 45 mins/month), stabilized Azure VM workloads, eliminated Bastion and high-cost D8 VM footprints, fixed bandwidth cost spikes, and built high-availability fallback layers with cache-first routing and distributed failover.
Day-to-day, I work across MERN + TypeScript, Node.js microservices, Docker/Kubernetes, Azure Cloud, Databricks, CI/CD automation, Prometheus/Grafana observability, and async caching pipelines powering 100K+ monthly active operations.
Outside core engineering, I’m a Problem-Solving & DSA Enthusiast with 5000+ problems solved, a 1500+ day coding streak, and top 0.1% global rankings across platforms. As a mentor to 40,000+ learners, I help engineers master DSA, System Design, Development, DevOps, and Remote Job Preparation, guiding them from theory to real-world success.
I love building scalable systems, intelligent architectures, and next-generation AI-first engineering experiences that blend reliability, performance, and deep technical innovation.

Experience

ProPeers logo

ProPeers

Founding Engineer & AI Architect

July 2025 – Present · Delhi, India · Remote

  • Architected the full AI ecosystem powering RoadmapAI, CodeLLM, AskAI, Global AI Search and the AI Code Editor building Agentic AI pipelines, RAG systems, MCP server architecture and LLM orchestration that now drives 80%+ of total platform traffic.
  • Engineered RoadmapAI end-to-end with a self-learning RAG pipeline (text-embedding-ada-002, ChromaDB, semantic filtering, adaptive difficulty) and MCP-layered prompts, achieving sub-second inference and large-scale personalization.
  • Delivered ~99% personalized roadmap accuracy using Agentic flows, structured prompt masks, multi-model routing, and RAG optimization directly improving RoadmapAI user ratings from the early 12% baseline.
  • Built CodeLLM, an AI judge with multi-language detection, dual-layer JSON parsing, context-aware error classification (COMPILATION/RUNTIME/VALIDATION), semantic retrieval and deterministic verdict synthesis.
  • Developed AskAI, an agentic programming assistant using MCP-based prompt pipelines, resource-aware context analysis, dynamic O3Mini/O1 routing, token metering and automated formatting boosting engagement 3× and answer resolution speed 2×.
  • Shipped the AI Code Editor with real-time AI review (<40ms), inline reasoning, multi-language execution and deep RoadmapAI/CodeLLM integration raising editor retention by 40%.
  • Scaled Roadmap features to 120K+ organic users and improved MAU by 46% through rapid iteration, tight user-feedback loops and stable AI feature launches.
  • Delivered Individual Roadmap Communities enabling peer-matching, shared progress tracking and roadmap-level micro-communities.
  • Optimized CI/CD and deployment systems, cutting deployment time by 34%, automating multi-service rollouts, and enabling safer high-frequency releases.
  • Reduced platform downtime by 90% (4 hrs to 45 mins/month) via infra hardening, progressive fallbacks, cache-first routing, real-time health checks and load-aware autoscaling.
  • Implemented complete analytics & aggregation pipelines for 100K+ users with Redis caching, chunked batch aggregation, API acceleration and advanced rate-limit enforcement.
  • Developed full search-validation engines (Roadmaps + RoadmapAI), ensuring context-safe retrieval, hallucination-resistance and consistent multi-node semantic validation.
  • Performed Azure cost & infra optimization VM right-sizing, eliminated Bastion, stabilized Redis/Entra costs, contained Cognitive Service spikes and resolved large bandwidth egress surges.

SDE - 1

July 2024 – July 2025 · Delhi, India · Remote

  • Built and scaled the flagship "Roadmaps" feature, delivering 100+ curated learning paths across DSA, Development, and System Design used by 100K+ users. Improved personalization and relevance, while reducing API response time from 2.1s to < 300ms, resulting in a 7x faster experience and 40% higher user engagement.
  • Worked on complex APIs to reduce processing time and improved tab switching experience for smoother navigation
  • Developed and integrated the "AskAI + Discussion Forum", an intelligent peer-programming assistant where users can interact with AI to solve DSA/Dev doubts and collaborate with others enabling on-demand doubt resolution and community learning.
  • Engineered a Session Recording Bot using Python, Selenium, and headless Azure VMs with deep link automation automating session joining and recording, cutting down 100% of manual effort and improving reliability.
  • Optimized 150+ APIs by implementing advanced caching layers, async processing, and API pipelines, reducing backend latency by up to 70% and improving system throughput.
  • Reduced core web vitals TBT, LCP, and FCP from 4.4s to 990ms through advanced frontend optimizations (SSR, dynamic imports, lazy-loading APIs), significantly boosting UX for 15K+ monthly active users.
  • Led the end-to-end performance overhaul of the platform, focusing on smoother tab-switching experiences, minimal downtime, and blazing-fast navigation across the app.
  • Migrated MongoDB from Atlas to self-hosted replica sets, wrote automated backup & recovery scripts, set up VMs, and integrated cron-based backups to Azure Blob, ensuring data durability and cost-efficiency.
  • Set up real-time monitoring and alerting with Prometheus and Grafana, ensuring system health, proactive issue resolution, and enhanced DevOps visibility.
  • Deployed scalable CI/CD pipelines using Azure, GitLab, and Vercel, ensuring zero-downtime deployments and faster iteration cycles across teams.
  • Handled end-to-end production deployment and scaling for a system serving 15K+ users, maintaining high availability, fault tolerance, and robust performance at scale.
Cloud Conduction logo

Cloud Conduction

Junior Software Engineer

Jan 2024 – June 2024 · USA, · Remote

  • Built an AI-powered chat application from the ground up using React and .NET, improving frontend efficiency by 60% and backend performance by 30%, delivering a highly responsive user experience.
  • Integrated and optimized AI model responses, reducing latency from 1.86s to 1.2s (35% faster) through strategic API design, caching, and performance tuning.
  • Designed scalable cloud architecture on Microsoft Azure for AI workloads, improving system throughput by 10% while significantly reducing infrastructure costs via autoscaling and resource optimization.
  • Developed modern, responsive UI components in React that improved user engagement metrics by 25%, including better retention and interaction rates.
  • Implemented secure, scalable API gateways in .NET Core, capable of handling 500+ concurrent requests with 99.9% uptime, supporting production-level reliability.
  • Led the implementation of new features using the MERN stack, cutting down development time by 40%, and accelerating product iteration cycles.
  • Established CI/CD pipelines (Azure DevOps & GitHub Actions), reducing deployment failures by 75% and enabling faster, automated releases.
  • Conducted in-depth code reviews and optimization, reducing technical debt by 30%, standardizing best practices across teams, and improving maintainability.
  • Owned and managed the complete project lifecycle, from initial system design and dev planning to production deployment, server setup, and post-launch support.

INDIVIDUAL CONTRIBUTOR

maximize
I’ve engineered the core of our AI ecosystem Multi LLLM's Orchestration , RoadmapAI, CodeLLM, AskAI, AI Code Editor, and the Global AI Search, designing end-to-end Agentic AI pipelines with RAG-driven personalization, MCP-layered orchestration, and multi-model LLM architectures that deliver real-time learning guidance, deterministic code evaluation, and deeply context-aware programming assistance at scale.
I’ve worked hands-on with leading LLM MODELS and AI platforms including OpenAI, Google AI, Anthropic (Claude), MistralAI, Meta, Grok, Moonshot and Databricks, implementing intelligent model routing, fallback strategies, cost-aware inference, and latency-optimized multi-provider execution.
The system leverages Vector Databases (ChromaDB) for semantic context retrieval and long-term memory, enabling high-precision RAG workflows. I’ve implemented token-level streaming responses to deliver real-time AI output, along with response caching, embedding reuse, and prompt-result memoization to significantly reduce latency, repeated inference, and overall token costs.
My work spans LLM System Design, tokenization & reasoning flows, streaming & tool-calling agents, vectorized context pipelines, and high-availability AI microservices, forming the intelligence backbone of the platform.
I've also strengthened the platform's foundation by optimizing over 150+ critical APIs for latency, reliability, throughput, and large-scale fault-tolerance.
  • Architected an end-to-end RAG-powered AI learning platform serving 100K+ users with sub-second inference latency, leveraging Azure OpenAI embeddings (text-embedding-ada-002), ChromaDB vector indexing, and semantic retrieval with dynamic topic-aware filtering achieving 0.25 similarity-threshold precision
  • Engineered a self-evolving knowledge graph where every AI-generated artifact (roadmaps, articles, practice questions) is automatically embedded, vectorized, and reintegrated into ChromaDB creating a continuously learning retrieval layer that improves semantic accuracy with each user interaction
  • Built an intelligent RAG pipeline with multi-stage context optimization combining semantic vector similarity search, domain-specific keyword enforcement, exclusion-based noise filtering, and quality-threshold gating (0.25 cutoff) to deliver hallucination-resistant contextual augmentation
  • Designed a production-grade MCP-compliant prompt orchestration system with structured message arrays (system/user roles), dynamic context injection based on user proficiency levels (1-5 scale), adaptive difficulty mapping (Beginner/Intermediate/Advanced), and goal-oriented content generation across 3 formats
  • Implemented a real-time intent classification engine with confidence-weighted pattern matching across 4 transformation operators (NEW_SUBROADMAP, ADD_TOPICS, PROJECT_CREATION, REGENERATE_PIPELINE) using 20+ keyword signatures per intent and hierarchical fallback resolution for ambiguous requests
  • Developed a conflict-safe progress-preserving merge algorithm that maintains atomic user state (isDone flags, bookmarks, annotations, code links) during AI-driven content expansions through differential patching, duplicate detection, and rollback-capable database transactions
  • Created a multi-layer security validation framework with lexical abuse detection (violent/illegal/inappropriate patterns), technical relevance scoring across 15+ engineering domains, injection-attack guards, and AI-powered verification with a 0.6 confidence threshold for edge cases
  • Architected a scalable token-governance system with tiered allocation models (8 free tokens + purchased pools), operation-based cost accounting (Creation: 2 tokens, Customization: 4 tokens), atomic transaction handling via MongoDB optimistic locking, and graceful quota degradation
  • Optimized database performance through strategic indexing with compound indices on (userId, sessionId, isDeleted), aggregation pipeline optimization for history queries, session-based data isolation, soft-delete mechanisms, and pagination limiting to 50 records per fetch
  • Implemented a multi-model AI orchestration layer supporting dynamic routing between o3-mini (8K context window) for complex generation and gpt-3.5-turbo (4K context) for standard operations, with consistent MCP interface abstraction and model-specific parameter tuning
  • Built a resilient fallback architecture ensuring 100% availability with RAG-miss graceful degradation, sparse-query fallback prompts, cache-bypass recovery paths, multi-tier error handling, structured security-event logging, and health-check monitoring across all AI subsystems

System Architecture & Details

Select a view to see the architecture flow

  • Architected an end-to-end AI-powered code evaluation system replacing traditional compilers with RAG-enhanced logical judgment, leveraging semantic retrieval, model-context engineering, and multi-model orchestration to achieve 99% evaluation accuracy across Python, Java, C++, and JavaScript.
  • Built a multi-stage language detection engine using regex patterns, anti-pattern suppression, syntax heuristics, and confidence-based classification to prevent cross-language submissions and ensure evaluation integrity for every code block.
  • Implemented a production-grade MCP-compliant prompt pipeline generating strictly structured system/user message arrays, including judge instructions, evaluation rules, test-case schemas, complexity requirements, and JSON-first verdict formatting.
  • Designed a dual-layer response parsing system with JSON block extraction, Markdown fallback resolution, regex-based error isolation, and verdict normalization to guarantee consistent outputs even with noisy AI responses.
  • Engineered a multi-model AI orchestration layer dynamically routing requests between o3-mini (accuracy), o1 (reasoning), and gpt-35-turbo (performance) with token-window optimization and context-aware selection.
  • Integrated a RAG pipeline with ChromaDB using text-embedding-ada-002 to retrieve reference solutions, constraints, edge cases, and complexity hints, enabling AI to perform context-enriched evaluation rather than plain code matching.
  • Created a modular progress-tracking engine mapping submissions to TodoItems, Topics, and Subroadmaps, automatically updating isDone status and learning milestones through real-time backend sync and user completion logic.
  • Developed a robust validation and error-classification layer with strict checks for payload integrity, language mismatches, test-case correctness, sanitized code inspection, and COMPILATION_ERROR / RUNTIME_ERROR / VALIDATION_ERROR generation.
  • Implemented a structured verdict generator delivering human-like educational feedback including passed/failed test-case breakdowns, root-cause explanations, error localization, corrected code suggestions, and time/space complexity analysis.
  • Optimized backend infrastructure using MongoDB submission architecture with collections for Submission, TodoItem, Topic, UserTodoItemMapping, ensuring analytics-ready storage, high-throughput writes, and environment-aware routing for dev/prod deployments.
  • Achieved scalable, real-time evaluation flows combining JWT-secured endpoints, load-balanced AI calls, semantic retrieval augmentation, multi-model fail-safes, and a high-availability fallback pipeline for uninterrupted code judging.

System Architecture & Details

Select a view to see the architecture flow

  • Architected and developed a production-grade AI programming assistant handling 100+ RPS with 99.9% uptime across learning platform resources.
  • Engineered sophisticated multi-model AI orchestration routing questions between O3Mini, O1, GPT-3.5 Turbo, and Llama 3.3 based on question complexity and resource type.
  • Built comprehensive token management system with dual-token architecture (9 free + purchased), atomic MongoDB operations, and fair usage enforcement preventing system abuse.
  • Implemented MCP (Model Context Protocol) prompt engineering with three specialized generators eliminating RAG infrastructure while maintaining response quality.
  • Designed intelligent model selection algorithm routing Practice Questions to O1, complex DSA to O1, articles to GPT-3.5, and general questions to O3Mini for optimal performance.
  • Developed advanced response processing pipeline with autoWrapCode (10+ language detection), formatAIResponse (markdown fixing), and removeConversationalEndings (AI fluff removal).
  • Created scalable session management with three MongoDB schemas (generic, roadmap-specific, content creation), soft deletion, voting system, and optimized query patterns.
  • Built complete API security layer with JWT authentication, rate limiting, input sanitization, HTTPS enforcement, and comprehensive error handling across 6+ endpoints.
  • Implemented production monitoring system with response time tracking, token usage analytics, structured logging, and health checks for continuous optimization.
  • Achieved 3x user engagement and 2x resolution speed through intelligent model selection, clean response formatting, and context-aware interactions.
  • Engineered no-RAG architecture using sophisticated prompt engineering instead of vector databases, reducing infrastructure costs by 60%.
  • Added content caching optimization with RoadmapAskAIContentCreation schema and duplicate request prevention for article improvements.
  • Implemented question classification system using GPT-3.5 Turbo to categorize questions into 7 types (DSA, System Design, Development, etc.) for better routing.
  • Designed circuit breaker pattern and fallback chains (O1 → O3Mini → GPT-3.5 → Llama) for API failure resilience and graceful degradation.

System Architecture & Details

Select a view to see the architecture flow

  • Architected and deployed a production-grade AI-powered global search engine across ProPeers platform serving Roadmaps, Digital Products, Mentors, Webinars, and Bootcamps from a single query.
  • Engineered dual-model AI orchestration using Kimi-K2.5 for intent extraction and GPT-5.2 for response generation via Microsoft Azure Foundry, enabling sub-second intelligent query understanding.
  • Built a 3-collection ChromaDB vector search system with separate collections for Roadmaps (roadmapsearchdata-v0), Digital Products (dpsearchdata-v0), and Mentors (mentorsearchdata-v0) using Azure OpenAI text-embedding-ada-002.
  • Implemented hybrid search pipeline combining semantic vector search (ChromaDB) for Roadmaps/DPs with tag-overlap scoring for Mentors, achieving 0.62 score threshold for relevance filtering.
  • Designed intent extraction engine using Kimi-K2.5 that parses user queries including Hinglish, short-form, and vague inputs into structured JSON with intent, goal, topic_tags, and audience classification.
  • Built full data ingestion pipeline with MongoDB fetch scripts, Azure OpenAI embedding generation, batch processing (5 docs/batch with rate limiting), and ChromaDB storage across 3 collections covering 100+ roadmaps, 18+ DPs, and 20 top mentors.
  • Implemented deduplication logic for Digital Products and parallel search across all 3 ChromaDB collections using Promise.all, reducing search latency significantly.
  • Integrated live MongoDB fallback layer for Webinars and Bootcamps — fetching isActive records directly from DB in parallel with ChromaDB search, enriching LLM context without RAG overhead.
  • Developed score-based relevance filtering with threshold 0.62 — Cooking/gibberish queries correctly trigger fallback while specific queries like 'Goldman Sachs ECHP' achieve 0.78+ similarity scores.
  • Created 19 smart suggestion chips with intent-optimized queries covering DSA, System Design, Full Stack, AI/ML, DevOps, Cloud, and Career Switch — sorted by user search frequency for maximum discoverability.
  • Built complete Next.js frontend with Tailwind CSS featuring search bar, smart chips, result cards for all 5 content types (roadmaps, DPs, mentors, webinars, bootcamps), AI response display, and fallback handling.
  • Validated system across 6 edge case categories — Easy (React query: 0.69 avg), Medium (Hinglish career switch: 0.74), Hard (out-of-scope cooking: fallback triggered), Gibberish (fallback triggered), Single-word (DSA: 0.74), and Niche (Goldman Sachs: 0.786).
  • Designed 3-step LLM response generation: Kimi extracts structured intent → ChromaDB+MongoDB fetch results → GPT-5.2 generates 60-word motivational summary with structured result presentation.
  • Implemented Microsoft Azure Foundry model integration for both GPT-5.2 (max_completion_tokens) and Kimi-K2.5 (system role normalization, think-tag stripping, 3000 token allocation for reasoning).

System Architecture & Details

Select a view to see the architecture flow

  • Engineered an AI-integrated code editor using Monaco, seamlessly tied into CodeLLM and AskAI pipelines.
  • Supported live verdicts, multi-language (C++, Java, Python) switching, and dynamic prompts based on user activity.
  • Embedded AI-based feedback inline within the editor via backend event sync and code stream capture.
  • Delivered interactive IDE-like experience with <40ms event lag, boosting engagement and retention by 40%.
  • Tight integration with RoadmapAI and CodeLLM for contextual assistance
  • Real-time code validation and suggestions during typing

System Architecture & Details

Select a view to see the architecture flow

  • Refactored and optimized over 150 core APIs (Editor, Roadmap, AskAI, Profile) for high-throughput performance.
  • Reduced average response latency from 2.2s → 300ms through async queues, parallel batches, and Redis caching.
  • Introduced pagination layers, ElasticSearch indexing, and horizontal load balancing to maintain SLA under scale.
  • Achieved 70% backend performance boost and improved Core Web Vitals (TTFB, LCP, FCP) across all pages.
  • Load tested to 10K RPM 99.95% uptime sustained with zero cold-starts using warmed cloud functions.
  • Implemented advanced caching strategies and async processing
  • Enhanced frontend performance through SSR, dynamic imports, and lazy-loading

System Architecture & Details

Select a view to see the architecture flow

Problem Solving & DSA

Key Highlights

  • 5000+ Problems Solved Across 10+ Platforms
  • 1500+ Day Unbreakable Coding Streak
  • Knight Badge @LeetCode (Top 5% Worldwide)
  • InterviewBit Global Rank 13 (6⭐ Problem Solving)
  • Institute Rank 1 & Global Rank 98 @GeeksForGeeks
LeetCode

LeetCode

1879+ (Top 5% Worldwide)

1400+ solved

4⭐ Problem Solving

GeeksForGeeks

GeeksForGeeks

Institute Rank 1 & Global Rank 98

1300+ Solved solved

6⭐ Problem Solving

InterviewBit

InterviewBit

1854+ (Master)

560+ Solved solved

Rank: Global Rank 13

CodeStudio

CodeStudio

1854+ (Specialist)

2000+ solved

Rank: Global Rank 130

6⭐ Problem Solving

HackerRank

HackerRank

6⭐ Problem Solving

300+ solved

Rank: Rank 52

HackerEarth

HackerEarth

1260+ Top 10%

200+ solved

Rank: Rank 101

5⭐ Python/Java

Technical Skills

maximize
HOT SKILLS

AI / ML

RAGLLMsxAIMCP OrchestrationLLMOps / AIOpsPyTorchLangChainscikit-learnToken StreamingLLM Fine-TuningCaching & MemoizationPrompt EngineeringTool-Calling AgentsAI Memory SystemsInference OptimizationMulti-Model RoutingLow-Latency PipelinesRetrieval StrategiesMultimodal AI

Frontend Development

Next.jsReactReduxCSSHTMLSSRCSRReact QueryTailwindCSSHybrid RenderingBootstrap

Backend Development

GolangNode.jsFastAPIExpressDjango.NET

Cloud & DevOps

DockerAWSAzureTerraformCI/CDKubernetesGitHub ActionsGitLab ActionsJenkinsGrafanaPrometheus

Databases

RedisMongoDBMySQLChromaDBFirebaseVector DBsCache DesignTTL CachingVector SearchHybrid Search

Programming Languages

PythonTypeScriptJavaScriptSQLJavaC++Bash

Tools

GitFigmaLinuxGitHubScrapyVS CodePyCharmPostmanSeleniumIntelliJ IDEAGitHub Copilot

Education

institution iconSage University Indore

B.Tech in Computer Science

2020 – 2024 · MP, India

CGPA: 8.5/10

Featured Projects

maximize
Go (Gin framework)ChromaDB (self-hosted, HNSW cosine index)Azure OpenAI (text-embedding-3-large)Azure OpenAI (GPT-5.2 reasoning model)AWS Bedrock (Claude Sonnet 4.6)Google Gemini 3.1 ProMongoDB Atlas (stored_responses - AI response cache)Redis Cache (optional - background health-watcher goroutine)Server-Sent Events (SSE)WebSocket (gorilla/websocket)JWT Authentication (golang-jwt/jwt v5)zerolog (structured logging)lumberjack (rotating log files)Air (hot reload / live development)Go goroutines (channel-based worker pools)ChromaDB REST API v2Azure Virtual MachinesAWS EC2Reciprocal Rank Fusion (RRF)Maximal Marginal Relevance (MMR)+2
  • Architected a production-grade AI tutoring backend in Go built around a 6-step RAG pipeline: Step 1 (Query Expansion + HyDE) uses 2 GPT-5.2 calls to generate 4 semantic query variants plus a Hypothetical Document Embedding - a short solution paragraph written by the LLM that embeds close to solution chunks in vector space, dramatically improving solution retrieval recall; Step 2 (Multi-Vector Search) embeds all 6 queries using Azure OpenAI text-embedding-3-large and fires up to 12 ChromaDB queries (6 × 2 collections); Step 3 (RRF Merge, k=60) fuses all ranked lists using Reciprocal Rank Fusion, immune to embedding scale differences across query texts; Step 4 (Keyword Boost) adds deterministic signals (word-overlap ratio, exact prefix, per-word title/topic/platform matches) computing a HybridScore = RRFScore + boost; Step 5 (Problem-Solution Pairing) joins problem and solution chunks by problem_id into unified RAGContext structs; and Step 6 (MMR Diversity Filter, lambda=0.7) applies Maximal Marginal Relevance for a 70% relevance / 30% diversity balance, returning exactly topK results.
  • Engineered dual ChromaDB collections on self-hosted Azure and AWS VMs with runtime cloud switching via environment variable: dsa-cluster stores 3,000+ LeetCode problems as two-chunk documents per problem (problem-{id} containing full description, examples, constraints, and hints + solution-{id} containing the editorial) joined at retrieval time by problem_id - delivering both question and answer together to the LLM in a single enriched context; dsa-problem-list stores 3,000+ multi-platform DSA problems across 8 platforms: LeetCode, Codeforces, GeeksForGeeks, HackerRank, CSES, InterviewBit, HackerEarth, and CodeStudio - both collections using HNSW cosine-space indexing for length-normalized similarity across problems of varying description lengths.
  • Implemented platform-aware and difficulty-aware ChromaDB metadata filtering: detectPlatform() resolves 8 platform aliases from query text (lc/leetcode, cf/codeforces, gfg/geeksforgeeks, hr/hackerrank, cses, ib/interviewbit, he/hackerearth, cn/codestudio); detectDifficulty() handles both label-based difficulty (easy/medium/hard for LeetCode and GFG) and 15+ Codeforces numeric rating ranges (800 beginner through 3500+ grandmaster, each with a ±50 window) - producing ChromaDB $eq and $gte/$lte where filters composed with $and for multi-condition queries; and a dual difficulty field design (difficulty_label string + difficulty_rating int) that enables both label and numeric filtering on the same dsa-problem-list collection without schema duplication.
  • Built a multi-LLM orchestration layer integrating three distinct AI providers: GPT-5.2 (Azure OpenAI) as the primary model for query expansion, HyDE generation, and streaming answer delivery - correctly configured as a reasoning model using max_completion_tokens (not max_tokens) with no temperature or top_p parameters; Claude Sonnet 4.6 (AWS Bedrock) as secondary answer generator via Bedrock Runtime invoke endpoint; and Gemini 3.1 Pro (Google AI) as tertiary generator - with all three verified at server startup via TestAllModels() sending a test prompt and logging success/failure per provider before accepting any user traffic.
  • Designed a hybrid scoring system that outperforms pure vector search for exact DSA problem name queries without requiring BM25 or a separate sparse retrieval index: word-overlap ratio boost (0.0–2.0 based on matched significant words / total significant words), exact title prefix match (+1.5), per-word title match (+0.3 each), per-word topic match (+0.1 each), per-word platform match (+0.2 each), solution/problem type intent match (+0.3), explicit boost type match (+0.2), and a -2.0 penalty for empty titles - stacked on top of the RRF base score with all stop words and structural keywords excluded from word-overlap calculation.
  • Implemented a real-time SSE (Server-Sent Events) streaming architecture on Go/Gin delivering incremental GPT-5.2 response chunks with four typed event envelopes: status (pipeline progress), chunk (one LLM token/segment), done (stream complete with metadata), and error (terminal failure) - with streaming-specific HTTP headers (Cache-Control: no-cache, Connection: keep-alive, X-Accel-Buffering: no) injected per-route via Gin middleware and client disconnect detection via context.Canceled propagated through the streaming callback, enabling clean stream termination without goroutine leaks.
  • Built a concurrent data ingestion pipeline using 10 goroutine workers per collection via a channel-based worker pool pattern: each worker consumes jobs from a buffered channel, calls Azure OpenAI text-embedding-3-large to generate float32 embedding vectors, and posts to ChromaDB via REST API v2 upsert - fully idempotent (upsert semantics allow safe re-runs), with per-worker embedding failure isolation (failures increment a shared failCount without halting the run), and a separate two-chunk ingestion path for dsa-cluster that produces both problem and solution chunks from a single JSON record with conditional solution chunk creation when Solution != "".
  • Implemented a structured logging system using zerolog for structured JSON log events and lumberjack for rotating log files: three separate log files (info, warn, error) with date-stamped filenames in logs/, 20MB max per file, 30-day retention for info/warn and 60-day for errors - and named checkpoint logs at every RAG pipeline step (query expansion done, filters detected, multi-vector search done, RRF merge done, problem-solution pairing done, MMR filter done, RAG pipeline done) with structured fields per step enabling full per-query execution reconstruction from logs alone.
  • Deployed a JWT-authenticated WebSocket endpoint (/api/v1/voiceServices/voiceRecord) alongside REST cognitive services, with token validation on the WebSocket upgrade handshake; optional Redis query caching with a background health-watcher goroutine that retries on failure up to 10 times before auto-disabling the cache and stopping cleanly; graceful HTTP server shutdown via OS signal capture (SIGINT/SIGTERM) with a 10-second drain timeout; and multi-cloud ChromaDB connectivity switchable at runtime via VM_SOURCE env var between Azure VM and AWS EC2 hosts without code changes.
  • Delivered two distinct RAG search modes sharing the same pipeline infrastructure: Full RAG (RAGSearch) for DSA tutoring - queries both dsa-cluster and dsa-problem-list collections in parallel with LLM-expanded queries, performs problem-solution pairing for paired context delivery, and streams a complete DSA instructor explanation via GPT-5.2; CP Search (CPSearch) for competitive programming - queries only dsa-problem-list with CP-tuned variant prompts focused on algorithm tags (dp, greedy, graph, segment tree, bitmask), Codeforces rating ranges, and contest problem phrasing, fetching nResults=15 per query for higher recall, and streams a focused competitive programming editorial; plus a Fast Search mode (sub-200ms, zero LLM calls) via SearchDSACluster, SearchDSAProblemList, and HybridSearchAll functions for exact name lookups using a single embedding with inline keyword boost and topK×3 over-fetch + re-rank pattern.
  • Built a MongoDB-backed AI response cache using a stored_responses collection with a compound unique index on (queryNormalized, mode) - where queryNormalized is the lowercase-trimmed query and mode is "leetcode" or "cp". On a cache miss, the full GPT-5.2 streaming response is collected in a strings.Builder across all chunk callbacks, then persisted to MongoDB via an upsert in a non-blocking goroutine (safe for concurrent saves on the same query) along with the complete RAGContext data (problem title, platform, difficulty, topics, link, problem text, solution text, FinalScore). On a cache hit, the stored response is replayed through a streamCached function that breaks the text into 80-rune chunks with an 8ms delay between each - making cached responses feel identical to a live LLM stream rather than an instant payload dump - while a goroutine increments the hitCount and updates lastHitAt without blocking the response path. The done event carries a cached: true/false metadata flag so the frontend can distinguish live vs cached responses.

Key Concepts

6-Step RAG Pipeline: Expand → Filter → Multi-Vector Search → RRF → Hybrid Boost → Pair → MMRHyDE - Hypothetical Document Embedding bridges short queries to solution chunk vector spaceReciprocal Rank Fusion (RRF k=60) across 6 query variants × 2 ChromaDB collectionsHybrid Scoring: RRFScore + Keyword Boost without BM25 infrastructureProblem-Solution Pairing by problem_id before MMR - FinalScore = max(problem, solution) HybridScoreMMR Diversity Filter (lambda=0.7) - 70% relevance, 30% diversityMulti-LLM Orchestration (GPT-5.2 + Claude Sonnet 4.6 + Gemini 3.1 Pro) verified at startupPlatform-Aware Metadata Filtering across 8 DSA platformsDual Difficulty Fields (difficulty_label + difficulty_rating) for label and rating-based cross-platform filteringSSE Token Streaming with 4 typed event envelopes (status, chunk, done, error)MongoDB Response Cache - compound index on (queryNormalized, mode), hitCount tracking, simulated SSE streaming replay on cache hit

Impact

  • Delivered a full production RAG pipeline in Go that fires up to 12 ChromaDB queries per user request (6 expanded query variants × 2 collections) with sub-200ms latency in fast search mode and 2-3 seconds end-to-end in full RAG mode - covering the complete path from query to streamed LLM answer.
  • Indexed 3,000+ LeetCode problems with full solutions in dsa-cluster and 3,000+ multi-platform DSA problems across 8 major competitive programming platforms in dsa-problem-list - ingested via a 10-worker concurrent Go pipeline with idempotent upserts that can safely re-run without duplicating data.
  • Implemented HyDE (Hypothetical Document Embedding) - a technique where the LLM generates a hypothetical solution paragraph that is embedded and used as a query vector, bridging the semantic gap between a short user query and a long solution chunk in embedding space, significantly improving solution retrieval recall over direct query embedding alone.
  • Applied Reciprocal Rank Fusion (RRF, k=60) following Cormack et al. (2009) to merge ranked result lists from multiple query variants - immune to embedding score scale differences across query texts - ensuring that documents consistently ranked high across many query variants are correctly promoted regardless of absolute similarity scores.
  • Built a problem-solution pairing system that joins problem and solution chunks by problem_id before MMR selection, so FinalScore is computed from the combined pair value - the LLM receives both the full problem statement and the solution explanation together in a single RAGContext, delivering richer instructional context than separate chunk retrieval.
  • Designed a dual difficulty field schema (difficulty_label string + difficulty_rating integer) enabling a single ChromaDB collection to serve both label-based platforms (LeetCode easy/medium/hard) and rating-based platforms (Codeforces 800–3500+) with the same $eq and $gte/$lte filter operators - eliminating the need for separate collections or platform-specific query paths.
  • Delivered graceful pipeline degradation: if GPT-5.2 query expansion fails, the pipeline continues with the original query alone; if HyDE generation fails, the pipeline proceeds with the 4 variants; if individual embedding calls fail during ingestion, the worker moves on without halting the run - all degradation paths are logged at warn level with structured fields for diagnosis.
  • Implemented an SSE streaming architecture with client disconnect detection via context.Canceled propagated through the streaming callback - ensuring GPT-5.2 streaming is cleanly terminated when the user closes the browser, preventing wasted token generation and goroutine leaks under concurrent load.
  • Built a MongoDB AI response cache (stored_responses collection) that eliminates redundant RAG pipeline executions and LLM token spend for repeated queries - on a cache hit, the stored response is replayed via a streamCached function that sends 80-rune chunks with 8ms intervals, making cached responses feel identical to a live GPT stream with no perceptible difference in UX; on a cache miss, the full response is collected via strings.Builder and persisted via a non-blocking goroutine upsert so the HTTP handler returns immediately; hitCount and lastHitAt are tracked per cache entry to surface the most frequently requested topics.
  • Received interest from founders and engineers after sharing the RAG architecture - specifically the HyDE + RRF + Keyword Hybrid approach in Go, which is rare outside Python-based RAG frameworks - validating the architectural approach as production-worthy and reusable for other domain-specific AI tutoring systems.
  • Achieved full per-query execution traceability through structured zerolog checkpoint logs at every RAG pipeline step - every user query can be fully reconstructed from the log files alone, including which query variants were generated, which filters were applied, how many results were retrieved per collection, RRF scores, pairing results, and final MMR selection order.

System Architecture & Details

Select a view to see the architecture flow

GitHub (Contributions Overview)

maximize

Loading Contributions Status...

10,000+

Engineers Learning With Me

From a beginner to helping engineers crack interviews, master DSA, and grow in tech.

Data Structures & AlgorithmsFull Stack DevelopmentInterview PreparationPlacement GuidanceAI-Driven Career GuidanceResume ReviewMock InterviewsSDE PreparationSystem Design (HLD & LLD)Problem SolvingCareer Roadmap Planning

10K

Subscribers

67.8K

Followers

4.9K

Followers

16.8K

Followers

1.2K

Followers

2.7K

Members

770

Members

© 2026 Prince Singh. All rights reserved.

Updated at April 2026
6,800+ Visitors
Profile

Prince Singhverified

Founding Engineer & AI Architect @ProPeers | Portfolio aka WebResume

0%

2 Years Experience
600K+ Users Impact
AI Products Builder
AI Prince
Prince Singh| AI Architect
Online

Hello! 👋

Prince Singh| AI Architect

Ask anything about my work & expertise.

My Expertise:

AI ArchitectureAgentic AIRAG & MCPPrompt EngineeringLLM ModelsFine-Tuning

Try asking: