Akash Kumar Sahani

Building GenAI & Agentic AI Systems | Backend & Distributed Systems Engineer | Java, Python, Spark, AWS, Kubernetes | DevOps & LLMOps

Gurugram, Haryana, India

@akashsahani

LinkedIn GitHub

Looking for jobs Open to relocating

About

I build AI platforms and distributed systems that operate at production scale. I'm a Software Engineer currently working at Whilter AI, where I design and operate cloud-native infrastructure, AI platforms, and high-performance backend systems that power real-world products. My focus lies at the intersection of LLMOps, machine learning infrastructure, and distributed backend engineering — building systems that move ML from experimentation into reliable, scalable production environments. Over the past few years, I’ve engineered 50+ microservices across Java and Python, built event-driven architectures using Kafka, and deployed large-scale systems on AWS and Kubernetes. I work extensively with observability, platform automation, and distributed data systems to ensure systems remain resilient as they scale. A key part of my work involves AI infrastructure and ML platforms: • Deploying Large Language Models using Ray + DeepSpeed on Kubernetes • Building end-to-end ML pipelines with Spark, Feast, and Kubeflow • Designing feature stores and scalable data pipelines for real-time and batch ML workloads • Creating LLMOps infrastructure for distributed inference and model deployment I also work deeply on data platforms and governance systems, designing enterprise-scale data ecosystems using Apache Atlas, Ranger, Hive, Iceberg, and Spark, enabling full data lineage and secure data access across large organizations. My core technical stack includes: Languages: Java, Python Backend & APIs: Spring Boot, FastAPI, Flask, gRPC, REST Distributed Systems: Kafka, Redis, event-driven microservices ML & Data Systems: Apache Spark, Feast, Kubeflow, MLflow, Hadoop, Hive Cloud & Infrastructure: AWS, Kubernetes (EKS), Docker, Helm, Jenkins Observability: Grafana Stack (Alloy, Mimir, Loki, Tempo) Data Governance: Apache Atlas, Ranger I’m particularly interested in AI infrastructure, distributed ML systems, and large-scale data platforms — the foundational layers that enable intelligent systems to operate reliably at scale. Currently exploring deeper problems around LLMOps, scalable AI systems, and energy-efficient computing. If you're working on AI infrastructure, ML platforms, or large-scale distributed systems, I'd love to connect.

Experience

Machine Learning Engineer

Kittivaasal Technologies Pvt. Ltd. · Chennai

Feb 2022 – May 2022

- Built a recommendation system that boosted user click-through by 70%, leading to more daily active users. - Secured User Data: Revamped robust authentication features in Java using OAuth 2.0, ensuring user data protection with industry-standard security protocols. - Visual Storyteller: Designed approximately 10 creative posters for project & organization, reaching an audience of 1000+.
Software Engineer

Whilter.ai · Gurugram

Feb 2019 – Jun 2026

- Architected and developed an execution framework & feature pipeline leveraging Apache Spark and Kafka to streamline data processing and event-driven workflows, enabling high-performance distributed computing for large datasets. - Implemented Python-based trigger service to manage event handling based on Kafka message streams, ensuring timely and accurate workflow execution. - Established development infrastructure, including CI/CD pipelines using Jenkins, Helm, and AWS EKS, to optimize development workflows, ensure seamless deployments, and enhance overall efficiency. - Deployed infrastructure for MLHub's feature store (Feast) and machine learning pipeline orchestration (Kubeflow), enhancing the end-to-end ML lifecycle. - Played a key role in development of 30+ microservices for the ABDM (Ayushman Bharat Digital Mission) project under NHA. - Architected 30+ microservices to enhance security measures and improve operational efficiency within the ABDM - Modernised bulk services to address large-scale data processing requirements, achieving a significant performance improvement. By creatively implementing a partitioning and parallel processing strategy, I reduced data processing time from 20 minutes to 1.5 minutes, resulting in a 92.5% efficiency gain. - Seized opportunities to implement 10+ new technologies such as Apache Kafka, Redis, Websocket, protobuf, gRPC, WhatsApp Business API, Digi Locker, etc. - Wrote comprehensive JUnit test cases to ensure the reliability, robustness, and correctness of the streamlined microservices and functionalities, achieving code coverage of 80 to 100 percentage. - Crafted an efficient deployment workflow utilizing Jenkins automation. Achieved remarkably swift and error-free deployments within 10 minutes. Championed AWS deployment practices for optimal scalability and robust performance. - Trained, refined, and deployed machine learning models for similarity matching, building detection, and human face verification under 1

Education

B.Tech / B.E.

Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology · Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology

Computer Science & Engineering · Jun 2018 – Jun 2022
M.Tech / M.E.

Indian Institute of Technology Patna · Indian Institute of Technology Patna

Computer Science & Engineering · Jun 2026

Skills

Projects

Ayushman Bharat Digital Mission (ABDM) under National Health Authority (NHA)

Java, Springboot, Microservices, Kafka, Redis, Postgres, Docker, Kubernates, Helm, Jenkins, AWS, ELK

The Ayushman Bharat Digital Mission (ABDM), spearheaded by the National Health Authority (NHA), is revolutionizing India's healthcare landscape. Launched in 2020, ABDM is building a robust digital infrastructure to connect various stakeholders, including patients, healthcare providers, and health institutions. At the core of ABDM lies the Ayushman Bharat Health Account (ABHA), a unique health identifier for every citizen. ABHA empowers individuals to securely store and share their medical records electronically, streamlining access to healthcare services across the country. ABDM's reach extends far beyond individual patients. It provides a platform for: Healthcare providers: Doctors, nurses, and other healthcare professionals can access patient records with consent, improving diagnosis and treatment planning. Health institutions: Hospitals, clinics, and labs can securely exchange medical data, enhancing coordination and reducing redundancy. Public health agencies: Real-time data collection and analysis enable efficient disease surveillance and outbreak response. The 100 Microsites Project is a noteworthy initiative driving ABDM adoption. It focuses on onboarding private healthcare facilities in specific geographical areas through targeted outreach and support. This collaborative approach is proving successful in bridging the gap between public and private healthcare sectors. ABDM's potential to transform India's healthcare system is immense. By enabling seamless exchange of medical data, it promises to improve healthcare quality, accessibility, and affordability for all citizens.
MLHUB - ML Workflow Automation Platform

Java, Springboot, Python, FastAPI, Microservices, Spark, Kafka, Redis, Postgres, Docker, Kubernates, Helm, Jenkins, AWS, Kubeflow, Feast, Iceberg, Hive, HDFS

- MLHub is a comprehensive platform designed to revolutionize data processing and feature management by enabling efficient ingestion, cleaning, transformation, and utilization of data for advanced machine learning workflows. - Established a scalable data ingestion framework with multiple configurable pipelines to accommodate diverse data sources and formats, ensuring seamless integration with various systems. - Designed robust data filtering and cleaning mechanisms to ensure data quality and consistency, laying a strong foundation for reliable machine learning models. - Developed advanced data processing workflows to transform raw data into actionable insights, enabling real-time and batch processing for various business use cases. - Created and managed feature stores to centralize feature engineering outputs, ensuring efficient feature reuse and model training processes across projects. - Implemented data ingestion workflows with multiple configurations, providing flexibility for specific use cases and ensuring scalability for large-scale deployments. - This platform serves as a critical enabler for organizations looking to unlock the full potential of their data by transforming it into meaningful, actionable insights through advanced machine learning capabilities.
Whilter AI - Platform ↗

Java, Springboot, Python, FastAPI, Microservices, Kafka, Redis, Keycloak, Langchain, Langgraph, RAG, Postgres, Docker, Kubernates, Helm, Jenkins, AWS

- Developed a generative AI platform to deliver hyper-personalized, emotionally resonant videos at scale, dynamically customized based on user data such as location, behavior, language, and interests. - Automated video generation workflows, reducing campaign turnaround time by 8× and cutting production costs by approximately 50%. Scaled backend microservices architecture to support the generation and delivery of millions of unique video experiences across multiple distribution channels. - Implemented end-to-end observability using the Grafana stack (Alloy, Loki, Tempo, Mimir) to monitor REST APIs, Kafka pipelines, and data stores—achieving 100% traceability, real-time alerting, and reduced debugging time. - Standardized structured logging, metrics, and distributed tracing to improve performance monitoring and operational visibility across services. - Contributed to building ISO/IEC 27001:2022-compliant infrastructure, ensuring secure data handling, access control, and platform resilience.
Voice AI - Platform

Livekit, Livekit SIP, Java, Springboot, Python, FastAPI, Microservices, Kafka, Redis, RAG, Postgres, Docker, Kubernates, Helm, Jenkins, AWS

1. Designed and deployed a real-time Voice AI platform enabling low-latency, bidirectional voice conversations using WebRTC and SIP. 2. Built and maintained production grade multi-language CI/CD pipelines for Java, Python, and Node.js microservices, enabling fast, zero-downtime deployments. 3. Architected scalable, fault-tolerant infrastructure on AWS EKS with auto-scaling, isolated node groups, and secure networking. 4. Solved NAT traversal and media routing challenges using STUN, ensuring consistent audio flow across browsers, SIP providers, and cloud environments. 5. Enabled SIP-based telephony integration (e.g., Twilio) with WebRTC clients for enterprise-grade voice workflows. 6. Deployed and operated LiveKit Server, SIP, Agent, and Egress components to support real-time voice calls, agent interactions, and call recording. 7. Ensured secure, scalable, and production-ready voice infrastructure aligned with enterprise requirements.
Elevate AI — Multi-Tenant Marketing Automation Platform ↗

Java, Springboot, Python, FastAPI, Microservices, Kafka, Redis, Keycloak, Langchain, Langgraph, RAG, Postgres, Docker, Kubernates, Helm, Jenkins, AWS

- Built a production-grade multi-tenant SaaS platform for marketing automation using a microservices architecture with Spring Boot, Spring Cloud Gateway, and Next.js - Designed end-to-end multi-tenancy with tenant context propagating from JWT claims through API Gateway, BFF, and Core Service down to database-level row isolation — all transparently via shared Java library auto-configuration - Integrated AI-powered campaign content generation using OpenAI agents (via a FastAPI microservice) with brand research, product research, and creative generation workflows - Implemented a full OAuth2 social media integration layer supporting Google, Facebook (Ads + Instagram), and LinkedIn for automated campaign publishing - Centralized identity and RBAC using Keycloak (OAuth2/OIDC) with a custom role hierarchy (Super Admin → Admin → Brand Approver → Campaign Manager) enforced at both API and UI levels - Built an event-driven workflow for campaign approval lifecycle using Apache Kafka, decoupling creative generation from approval and publishing stages - Designed and implemented a Jenkins-based CI/CD platform with a controller and 2 distributed agents - Built a dynamic multi-language pipeline framework supporting full-system builds, language-specific builds (Java / Node.js / Python), and single microservice builds - Enabled branch-based deployments, allowing builds and releases from any Git branch - Deployed the full stack using helm, including PostgreSQL, Kafka, Keycloak, and a Python AI agent service — Kubernetes-ready deployment

Courses & certifications

Developer Learning Plan · Amazon Web Services (AWS) · 2022

🗣️ Languages

English · Professional
Hindi · Fluent
Nepali · Native