AI/ML & Healthcare

Enterprise AI/ML Platform

Built enterprise AI/ML platforms for healthcare, supply chain, airline, and government clients with 65-GPU multi-tenant infrastructure for LLM/GenAI workloads delivering $8M+ cost savings.

Trackonomy
Oct 2023 – Present
San Jose, CA (Remote)

Project Overview

As Distinguished Cloud AI Architect at Trackonomy, I architected and built enterprise-grade AI/ML platforms serving healthcare, supply chain, airline, and government clients. The platform enables organizations to deploy, train, and serve large language models (LLMs) and generative AI workloads at scale with enterprise security and compliance.

I built the infrastructure team from 0 to 6 engineers and managed $6M+ Azure/AWS budgets while achieving 73% cloud cost reduction, 75% faster deployments, and maintaining 99.97% uptime with zero critical security findings across SOC2, HIPAA, and FedRAMP compliance frameworks.

Azure Platform Architecture

Trackonomy Azure Platform Architecture

Multi-tenant Azure Platform - AKS, CPU Infrastructure, and Enterprise Security

Key Achievements

$8M+
Cost Savings Delivered
73%
Cloud Cost Reduction
65 GPU
Multi-Tenant Platform
99.97%
Platform Uptime
75%
Faster Deployments
0→6
Team Built

Industry Verticals

Healthcare Supply Chain Airline Government

Key Responsibilities

  • GPU Infrastructure: Architected 65-GPU multi-tenant platform using Slurm, Kubernetes, and NVIDIA Triton for LLM/GenAI workloads with optimized resource scheduling and cost allocation.
  • ML Pipelines: Built end-to-end ML pipelines with Apache Airflow, Apache Flink, MLflow, and Kubeflow for model training, versioning, and deployment automation.
  • Observability: Implemented comprehensive observability across 50+ microservices using Prometheus, Grafana, Datadog, and custom dashboards for GPU utilization monitoring.
  • Multi-Cloud Strategy: Designed and deployed infrastructure across Azure, AWS, and OCI with Terraform and GitOps for consistent, repeatable provisioning.
  • Application Migration: Led migration of 50+ applications to Kubernetes with zero-downtime deployment strategies and automated rollback capabilities.
  • Security & Compliance: Achieved SOC2, HIPAA, and FedRAMP compliance via Vanta and Snyk integration with zero critical security findings.
  • Cost Optimization: Implemented FinOps practices delivering 73% cloud cost reduction and $8M+ savings through right-sizing, reserved instances, and spot instance strategies.
  • Team Leadership: Built infrastructure team from 0 to 6 engineers, establishing best practices, documentation, and on-call rotations.

Technology Stack

AzureAzure AWSAWS OCIOCI KubernetesKubernetes TerraformTerraform ArgoCDArgoCD HelmHelm DockerDocker GitHub ActionsGitHub Actions PrometheusPrometheus GrafanaGrafana DatadogDatadog NVIDIANVIDIA GPU Slurm PythonPython MLflow KubeflowKubeflow SnykSnyk VaultVault
Energy & Oil/Gas Sector

OSDU Data Platform

Architected and deployed the Open Subsurface Data Universe (OSDU) R3 platform on AWS and Azure for major energy enterprises, enabling petabyte-scale seismic and subsurface data management.

Wipro
Feb 2020 – Jun 2023
Remote (Global Clients)

Project Overview

The Open Subsurface Data Universe (OSDU) is an industry-standard data platform that enables oil and gas companies to manage, integrate, and leverage subsurface and wells data at scale. I led the cloud architecture and infrastructure implementation for OSDU R3 across multiple major energy clients, building GPU-accelerated data ingestion and ML training pipelines for petabyte-scale seismic data processing.

This project involved designing multi-tenant landing zones, implementing DevSecOps practices, and ensuring compliance with enterprise security standards across AWS and Azure environments.

Architecture Overview

OSDU Azure Architecture Diagram

Multi-tenant deployment with AKS, Event Hubs, and Cosmos DB

Key Achievements

5+
Global Energy Enterprises
PB-Scale
Seismic Data Processing
55%
Downtime Reduction (DR)
Multi-Cloud
AWS & Azure Deployment

Enterprise Clients

ExxonMobil Chevron BP Shell TotalEnergies

Key Responsibilities

  • Platform Architecture: Designed and implemented OSDU R3 data platform on AWS and Azure, enabling seamless integration of subsurface, wells, and seismic data for exploration and production workflows.
  • GPU Infrastructure: Architected GPU-accelerated EKS/AKS clusters for large-scale seismic data processing and ML model training, optimizing compute costs while meeting performance SLAs.
  • Multi-Tenant Landing Zones: Provisioned secure AWS and Azure Landing Zones for 5+ global energy enterprises with isolated environments, network segmentation, and compliance controls.
  • CI/CD Automation: Built end-to-end deployment pipelines using GitHub Actions, Azure DevOps, Terraform, and Helm for consistent and repeatable infrastructure provisioning.
  • DevSecOps Implementation: Integrated security scanning with Snyk, SonarQube, and JFrog Xray into CI/CD pipelines. Managed secrets using AWS Secrets Manager and Azure Key Vault.
  • Disaster Recovery: Designed and implemented DR strategies reducing potential downtime by 55% with multi-region failover capabilities and automated recovery procedures.
  • Data Pipeline Optimization: Built high-throughput data ingestion pipelines handling petabyte-scale seismic data with Apache Kafka, Azure Event Hubs, and custom ETL workflows.

Technology Stack

AWSAWS AzureAzure KubernetesEKS/AKS TerraformTerraform HelmHelm GitHub ActionsGitHub Actions Azure DevOpsAzure DevOps DockerDocker PythonPython SnykSnyk VaultVault PrometheusPrometheus GrafanaGrafana PostgreSQL Apache Kafka NVIDIANVIDIA GPU