I build production-grade AI applications, focusing on low-latency voice architecture, automated media pipelines, and full-stack machine learning deployments. I specialize in running high-performance models locally and optimizing them for consumer-grade hardware.
- OmniCast: A low-latency AI Voice Platform featuring zero-shot voice cloning, Whisper-powered auto-transcription, and avatar lip-sync video generation optimized for local consumer GPUs (RTX 3070).
- Lumina AI: An end-to-end faceless video generation engine that orchestrates local LLMs, SDXL text-to-image, edge-TTS, and an automated FFMPEG stitching pipeline.
- Mouth Cancer Detection: A full-stack medical imaging classification pipeline combining a Next.js frontend with a Flask AI backend running PyTorch and TensorFlow models simultaneously.
- Agentic RAG Stylist: An AI-powered application utilizing Qdrant vector databases, LangChain, and local LLMs to generate contextual recommendations via semantic search and MediaPipe vision tasks.
- Document Intelligence Engines: Architected multi-stage ingestion, semantic chunking, and RAG extraction pipelines handling complex data extractions from legal, financial, and construction documentation using event-driven NATS JetStream and Redis architectures.
- AI & Machine Learning: PyTorch, TensorFlow, Hugging Face, LLM Quantization (GGUF), OpenCV, MediaPipe, Qdrant
- Backend & Systems: Python (FastAPI/Flask), Node.js (Express), NATS JetStream, Redis, PostgreSQL, MongoDB
- Frontend & UI: Next.js, React, Tailwind CSS, Zustand, Radix UI
- B.S. in Computer Science (2025)
- CGPA: 3.52
📫 Let's Connect: LinkedIn | khizarali.cs@gmail.com



