Available for Opportunities

Distinguished Cloud AI Architect & Platform Engineering Leader

Building large-scale GPU compute platforms, AI/ML infrastructure, and distributed systems at planetary scale. Expert in Slurm cluster management, GPU scheduling (NVIDIA A100/V100/DGX), Kubernetes orchestration, and multi-cloud architecture (AWS, Azure, GCP, OCI).

18+
Years Experience
$8M+
Annual Savings
10M+
Users Served
Sudhakar Chundu

Sudhakar Chundu

San Jose, California

AI/ML Kubernetes GPU/Slurm Multi-Cloud
73%
Cloud Cost Reduction
200+
Microservices Managed
99.97%
GPU Cluster Uptime
50+
Daily Releases

What Colleagues Say

Trusted by leaders across engineering, product, and executive teams at Trackonomy

"I had the opportunity to work closely with Chundu at Trackonomy, and he was someone I consistently trusted on our most critical cloud and AI initiatives. Chundu has a rare ability to combine deep technical expertise with clear, practical decision-making. Beyond the technology, Chundu is a thoughtful leader and collaborator."
NP

Nirali Patidar

Mission-Driven TPM Leader | AI/ML

Former Atlassian, Cisco, Oracle

"Sudhakar always focused on impact. His approach in every project was to ask how it supports revenue and enhances competitive position. He always applied tech to business, knew how to rally teams from both sides, and communicate to see our projects through. The best of the best. Learned a lot from him."
MY

Manal Yaqub

Startups @ Databricks

Client Partner

"Sudhakar made a real difference in how we built and deployed software. He streamlined our development process with automation, improved our CI/CD pipelines, and introduced solid DevSecOps practices. Whenever there was a DevOps challenge, Sudhakar was the go-to person everyone trusted."
KR

Kedar Rajwade

Director of Engineering

Distributed Systems | Agentic AI

"Sudhakar was instrumental in establishing our comprehensive compliance and security posture. His expertise enabled Trackonomy to successfully complete our first SOC2 Type II audit. He also led an initiative that reduced our annual hosting spend by hundreds of thousands of dollars."
KA

Keith Abrams

General Counsel

Trackonomy Systems

"Sudhakar has been instrumental in shaping a secure, scalable foundation for our AI-driven mobile stack. He designed Android CI/CD pipelines with DevSecOps tightly integrated. His work with OAuth and IAM ensured secure, seamless authentication while protecting sensitive data."
DR

Diwakar Reddy

Technical Lead | Mobile Engineering

Industrial IoT | AI and ML

"From day one, Sudhakar impressed me with a rare combination of deep technical knowledge and thoughtful problem-solving. He architected our CI/CD pipelines and cloud infrastructure with precision — reducing deployment time by 95% and dramatically increasing system reliability."
KV

Kasi Viswanath

Senior DevOps Engineer

Trackonomy Systems

"Sudhakar is a certified DevOps expert and an industry veteran. He developed seamless pipelines to enhance developer velocity. His emphasis on automation and constant exploration of new technologies helped make the company cloud agnostic and reduce infrastructure cost."
AP

Abhijeet Purkar

Software and Data | CMU

Gold Medalist OU

"Sudhakar is a highly knowledgeable engineer whom I worked with for multiple 'missions impossible', addressing fires that thanks to his quick thinking and action we were able to solve. He understood end to end the needs for the customers as well as the behavior of the product."
RA

Raymundo Alatorre

Sr. Manager Automation & IIoT

Flex

"Sudhakar is very knowledgeable in cloud infrastructure management as well as FinOps. He was a solid partner in the business, working to ensure reliable/scalable infrastructure, while also understanding and meeting the needs of others in the business."
TF

Troy Ford

CFO

Private Equity, M&A

"I worked with Sudhakar at Trackonomy and appreciated his strong work ethic and dedication to the DevOps initiatives he helped organize. He consistently put in significant effort and took ownership of the areas he was responsible for."
PS

Patty Steiman

Vice President Customer Success

Trackonomy | IoT Expertise

"I've been fortunate to collaborate with Sudhakar across two companies during my career. He demonstrates a strong work ethic and unwavering dedication to the DevOps initiatives he supports. He excels when working on clearly defined tasks, approaching them with determination."
SB

Sravani Bolla

Senior DevOps Engineer

Trackonomy Systems

"I had the opportunity to work closely with Chundu at Trackonomy, and he was someone I consistently trusted on our most critical cloud and AI initiatives. Chundu has a rare ability to combine deep technical expertise with clear, practical decision-making. Beyond the technology, Chundu is a thoughtful leader and collaborator."
NP

Nirali Patidar

Mission-Driven TPM Leader | AI/ML

Former Atlassian, Cisco, Oracle

"Sudhakar always focused on impact. His approach in every project was to ask how it supports revenue and enhances competitive position. He always applied tech to business, knew how to rally teams from both sides, and communicate to see our projects through. The best of the best. Learned a lot from him."
MY

Manal Yaqub

Startups @ Databricks

Client Partner

"Sudhakar made a real difference in how we built and deployed software. He streamlined our development process with automation, improved our CI/CD pipelines, and introduced solid DevSecOps practices. Whenever there was a DevOps challenge, Sudhakar was the go-to person everyone trusted."
KR

Kedar Rajwade

Director of Engineering

Distributed Systems | Agentic AI

"Sudhakar was instrumental in establishing our comprehensive compliance and security posture. His expertise enabled Trackonomy to successfully complete our first SOC2 Type II audit. He also led an initiative that reduced our annual hosting spend by hundreds of thousands of dollars."
KA

Keith Abrams

General Counsel

Trackonomy Systems

"Sudhakar has been instrumental in shaping a secure, scalable foundation for our AI-driven mobile stack. He designed Android CI/CD pipelines with DevSecOps tightly integrated. His work with OAuth and IAM ensured secure, seamless authentication while protecting sensitive data."
DR

Diwakar Reddy

Technical Lead | Mobile Engineering

Industrial IoT | AI and ML

"From day one, Sudhakar impressed me with a rare combination of deep technical knowledge and thoughtful problem-solving. He architected our CI/CD pipelines and cloud infrastructure with precision — reducing deployment time by 95% and dramatically increasing system reliability."
KV

Kasi Viswanath

Senior DevOps Engineer

Trackonomy Systems

View all recommendations on LinkedIn
|

Building and operating infrastructure at scale for global enterprises

Professional Experience

With 18+ years of hands-on experience building and operating large-scale GPU compute platforms, AI/ML infrastructure, and distributed systems at planetary scale. Leading global teams of 13+ engineers across Infrastructure, Security, Networking, and DevOps.

Currently Distinguished AI Architect at Trackonomy, managing $15M+ budgets while delivering $8M+ in documented cost savings. Built infrastructure team from 0 → 6 engineers serving Pharma, Airlines, Government, Manufacturing, Healthcare, and IoT sectors globally across 8 countries.

Expert in Slurm-based GPU compute platforms (65 GPUs, 99.97% uptime), serverless GPU infrastructure, Databricks/Spark/Kafka real-time pipelines, and comprehensive DevSecOps with SOC2, HIPAA, FedRAMP, and HITRUST compliance.

99.97%
GPU Cluster Uptime
$15M+
Budget Managed

Professional Experience

Building and operating infrastructure at scale for global enterprises

Distinguished Cloud AI Architect / Director of Platform Engineering

Trackonomy Systems

Oct 2023 – Present
  • Designed Slurm-based GPU compute platform (65 GPUs, 8 nodes) with Slinky and NVIDIA BCM; achieved 99.97% uptime serving 12+ enterprise clients for AI inference and training
  • Reduced cloud costs 73% ($10M→$2.7M/year) through GPU utilization optimization, fair-share scheduling, and vendor consolidation; led FinOps practice with showback/chargeback models
  • Built team 0 → 6 engineers | Managed $15M+ budgets | SOC2, HIPAA, FedRAMP, HITRUST compliance | Secured GenAI/LLM platform against prompt injection and data exfiltration

Senior SRE / Cloud Architect — ML Infrastructure

Wipro Technologies (OSDU Data Platform)

Feb 2020 – Oct 2023
  • Architected OSDU R3 data platform processing exabytes of seismic data on GPU-accelerated Kubernetes (EKS/Fargate) with Spark, Kafka, Hadoop/HDFS for ExxonMobil, Chevron, BP, Shell
  • Created 50+ Terraform modules; implemented GitOps (ArgoCD/FluxCD), KEDA autoscaling; reduced deployment time 80%
  • Built observability stack with Prometheus/Grafana/ELK and GPU metrics; implemented HA architecture achieving 55% downtime reduction via capacity planning

Multi-Cloud Architect / Senior Infrastructure Engineer

Tata Consultancy Services

May 2007 – Feb 2020

Progressive 13-year career across Fortune 500 clients in healthcare, government, telecom, and financial services.

Cloud Architect — Harvard Pilgrim Health Care Jun 2018 – Feb 2020
  • Led cloud modernization for HIPAA/HITRUST-regulated AI/ML applications to AWS with GPU-enabled EKS clusters
  • Implemented Jenkins/Ansible CI/CD pipelines | DevSecOps with HashiCorp Vault, SonarQube | Managed $6M budget
Cloud Senior Engineer — CNA Insurance May 2015 – Jun 2018
  • Managed Kubernetes/Helm deployments on AWS; pioneered Docker/Kubernetes adoption (2013-2014)
  • Led cloud migrations to AWS, Azure, OpenStack | CI/CD with Jenkins/Ansible
Solutions Architect — PwC Jan 2011 – Apr 2015
  • Led infrastructure deployments for 20+ facility buildouts including branch offices, call centers, and data centers
  • Integrated Chef/Jenkins deployment pipelines | Migrated VMware VMs to AWS | Managed $15M+ budgets
Middleware Engineer — Verizon, Owens Corning May 2007 – Jan 2011
  • 5 years deep Linux/Unix administration with WebSphere/WebLogic middleware; kernel tuning, JVM optimization
  • Managed large-scale production systems on bare-metal serving millions of users | 24x7 L3 operations

Tools & Technologies

Technologies I work with daily to build scalable, reliable cloud infrastructure and AI/ML platforms.

Cloud Platforms
AWSAWS AzureAzure GCPGCP OCIOCI
Container & Orchestration
DockerDocker KubernetesKubernetes HelmHelm IstioIstio
Infrastructure as Code
TerraformTerraform AnsibleAnsible CloudFormationCloudFormation BicepBicep
CI/CD & GitOps
ArgoCDArgoCD GitHub ActionsGitHub Actions JenkinsJenkins Azure DevOpsAzure DevOps
Monitoring & Observability
PrometheusPrometheus GrafanaGrafana DatadogDatadog OpenTelemetryOpenTelemetry
AI/ML & HPC
KubeflowKubeflow NVIDIANVIDIA GPU Slurm MLflow
Languages & Scripting
PythonPython GoGo BashBash TypeScriptTypeScript
Security & Compliance
SnykSnyk VaultVault SOC2 FedRAMP

Areas of Expertise

Building and scaling enterprise infrastructure across bare-metal and multi-cloud platforms

Linux & Systems

Deep expertise in Linux systems administration, performance tuning, and kernel debugging at scale.

Ubuntu RHEL Shell Systemd

Containers & Kubernetes

Managing large-scale Kubernetes clusters with advanced orchestration and GitOps workflows.

Kubernetes Helm ArgoCD Istio

Multi-Cloud Architecture

Designing infrastructure across AWS, Azure, GCP, and OCI with focus on cost optimization.

AWS Azure GCP OCI

Observability & SRE

Implementing comprehensive monitoring, alerting, and SLO-driven reliability engineering.

Prometheus Grafana Datadog OpenTelemetry

Infrastructure as Code

Automating infrastructure provisioning with modern IaC tools and robust CI/CD pipelines.

Terraform Ansible GitHub Actions Jenkins

Networking & Load Balancing

Expert in TCP/IP, DNS, service mesh, and multi-region load balancing architectures.

Route53 HAProxy Envoy VPC

Let's Connect

Currently exploring Cloud AI Architect and platform engineering leadership roles. Open to both full-time positions and consulting engagements.