Sudhakar Chundu

Sudhakar Chundu

Distinguished Cloud AI Architect

San Jose, California

AI/ML Infrastructure GPU/Slurm Kubernetes Multi-Cloud

Building AI/ML Platforms at Planetary Scale

I'm a Distinguished Cloud AI Architect and Platform Engineering Leader with 18+ years of experience transforming how enterprises build and operate infrastructure.

My career spans the full evolution of cloud computing — from bare-metal data centers to multi-cloud Kubernetes deployments to GPU-accelerated AI/ML platforms. I've led infrastructure initiatives at Fortune 500 companies including work with ExxonMobil, Chevron, BP, Shell, Harvard Pilgrim Health Care, CNA Insurance, PwC, and Verizon.

Currently at Trackonomy Systems, I serve as Distinguished AI Architect where I designed and built the company's Slurm-based GPU compute platform from the ground up — 65 GPUs across 8 nodes delivering 99.97% uptime for enterprise AI inference and training workloads.

I'm passionate about solving complex infrastructure challenges: reducing cloud costs by 73% ($10M to $2.7M annually), achieving enterprise compliance (SOC2, HIPAA, FedRAMP, HITRUST), and building self-service platforms that accelerate developer velocity while maintaining security and governance.

I lead global teams of 13+ engineers across Infrastructure, Security, Networking, and DevOps, with experience managing $15M+ budgets and delivering $8M+ in documented cost savings.

18+
Years Experience
$8M+
Annual Savings
99.97%
GPU Cluster Uptime
13+
Engineers Led

My Approach

Guiding principles that drive how I build and lead infrastructure teams

Cost-Conscious Architecture

Every architectural decision considers TCO. I've driven 73% cost reductions through right-sizing, spot instances, reserved capacity, and vendor consolidation without sacrificing reliability.

Security as Foundation

Security isn't an afterthought — it's built into every layer. From zero-trust networking to DevSecOps pipelines, I ensure compliance with SOC2, HIPAA, FedRAMP, and HITRUST from day one.

Developer Velocity

Great infrastructure is invisible to developers. I build self-service platforms with golden paths that accelerate delivery — reducing deployment times by 95% while maintaining governance.

Observability First

You can't improve what you can't measure. I implement comprehensive monitoring, tracing, and alerting that enables data-driven decisions and rapid incident response.

Infrastructure as Code

All infrastructure should be version-controlled, reviewed, and reproducible. I've created 50+ Terraform modules and established GitOps workflows that ensure consistency at scale.

Team Empowerment

Technology is only as good as the team operating it. I focus on building high-performing teams through mentorship, clear ownership, and creating environments where engineers thrive.

Let's Connect

Currently exploring Cloud AI Architect and platform engineering leadership roles. Open to both full-time positions and consulting engagements.