Sastry Munukutla

Littleton,CO

Summary

Senior Engineer with over 15 years of experience in developing scalable, observable, and resilient distributed systems across cloud, microservices, and enterprise platforms. Deep expertise in performance engineering, API scalability, system observability, and production reliability. Recently designed and implemented Retrieval-Augmented Generation (RAG) services utilizing Python FastAPI, FAISS vector search, OpenAI embeddings, and Docker, seamlessly integrated with Java/Spring APIs. Currently applying performance engineering principles to LLMOps, focusing on prompt regression testing, token monitoring, latency optimization, and AI service observability to enhance overall system efficiency.

Overview

2026

years of professional experience

Certification

Work History

Built end-to-end RAG pipeline: document chunking → embeddings → FAISS vector store → retrieval → prompt assembly → LLM response.
Implemented embeddings using OpenAI text-embedding-3-small and similarity search using FAISS.
Exposed RAG service as FastAPI REST endpoint consumable by Java/Spring services.
Containerized solution with Docker and designed deployment architecture for Kubernetes environments.
Implemented token usage monitoring, latency tracking, prompt validation, retries and fallbacks.
Evaluated vector database tradeoffs (FAISS vs managed options like Pinecone/Weaviate).
Built POC for agent-style orchestration (retrieval → reasoning → response).

Performance Test Lead & SRE

SQUARETRADE / ALLSTATE

01.2023 - Current

Designed and executed end-to-end load tests simulating millions of transactions.
Led performance optimization for Java/Spring Boot APIs and event-driven systems.
Built Splunk/Grafana dashboards and alerting for microservices health and latency.
Performed capacity planning, production issue analysis, and scalability validation.
Conducted chaos engineering experiments on Kubernetes using Gremlin.
Applied observability and regression rigor to LLM/RAG services.

Senior Consultant / Test Engineering Lead

DELOITTE

01.2013 - 01.2023

Built performance profiling tools capturing JVM/OS metrics into InfluxDB with Grafana dashboards.
Designed CI/CD pipelines executing performance suites via Jenkins, Docker, and Kubernetes.
Led modernization validation from Mainframe to Java/.NET platforms.
Conducted chaos engineering and resilience validation.
Automated performance regression and reporting frameworks.

Senior Software Engineer

ACCENTURE

01.2010 - 01.2013

Performance engineering for Microsoft Xbox, e-commerce, gaming, and insurance platforms.
JVM heap/thread dump analysis, memory leak detection using Eclipse MAT, JConsole.
Designed workload models, SLAs, and performance strategies for enterprise systems.

Education

Bachelor of Science - Computer Science

JNTU Hyderabad

Hyderabad, India

06-2010

Skills

GenAI / LLM / RAG: RAG Architecture, Embeddings, FAISS, Prompt Engineering, Token Monitoring, Agent Orchestration
Backend & APIs: Python (FastAPI), Java / Spring Boot, REST APIs, Docker, Kubernetes

LLMOps & Observability: Dynatrace, Grafana, Prometheus, Splunk, JMeter, Chaos Engineering

Certification

AWS Solutions Architect Associate
AWS Developer Associate
Azure Fundamentals (AZ-900)
Gremlin Certified Chaos Engineer
Certified Scrum Master

Timeline

Performance Test Lead & SRE

SQUARETRADE / ALLSTATE

01.2023 - Current

Senior Consultant / Test Engineering Lead

DELOITTE

01.2013 - 01.2023

Senior Software Engineer

ACCENTURE

01.2010 - 01.2013

Bachelor of Science - Computer Science

JNTU Hyderabad

HONORS & AWARDS (SquareTrade Hackathons)

Text to Claim – Dec 2024, Built a Docker-based system with Spring Boot and Python AI services enabling customers to file claims via SMS., Python AI service used BERT-based models to convert conversational text into structured data and map issues to problem codes., Automated claim creation and resolution suggestions via text interaction.,

SquareTrade GenAI(Ashronis) – Dec 2023, Built a GenAI Slack application using Python, Slack SDK, ChromaDB, and OpenAI., Generated embeddings from Confluence and JIRA data and stored in ChromaDB., Implemented similarity search retrieval and LLM response using contextual data.

AI Monitoring Assistant – Dec 2025, Developed an Agentic AI Monitoring Assistant that automates API performance monitoring across environments. It uses a Slack chatbot integrated with Dynatrace APIs to fetch, analyze, and present real-time performance, error, and infrastructure metrics instantly.