available · principal / staff

razzkumar.

Lead Software Engineer building resilient, multi-region Kubernetes platforms and shipping 100+ production apps. I work across DevOps, SRE, AI infrastructure, and full-stack delivery.

Kathmandu, Nepal·10+ yrs·open source
now:tuning Karpenter spot strategy on prod▸ featured case study
razz@k8s — zsh
$
>
$
>
$
>
↓ SCROLL
0+

Production apps

deployed & maintained

0+

Years of engineering

shipping at scale

0.00%

Platform uptime

multi-region SLO

02 — stack

The toolbox. From bare metal to LLMs.

DevOps & Cloud

AWSAzureGCPTerraformAnsiblePulumiCloudFormation

Kubernetes & Containers

KubernetesHelmArgoCDIstioDockerKustomizeCrossplane

CI/CD & GitOps

GitHub ActionsCircleCITravisCIAzure DevOpsFluxCDJenkins

SRE & Observability

PrometheusGrafanaLokiDatadogOpenTelemetryPagerDutySLO/SLI

AI / ML Infra

LangChainOpenAIVector DBsGPU SchedulingRAGMLflowTriton

Backend

Node.jsNestJSGoPythonGraphQLgRPCHL7

Frontend

ReactNext.jsTypeScriptD3.jsTailwindRedux

Data & Storage

PostgreSQLRedisMongoDBKafkaRabbitMQS3

03 — experience

Ten+ years. Six teams. One throughline.

Lead Software Engineer / DevOps

Jun 2023 — Present

Webpoint Solutions, LLC · Kathmandu, Nepal

  • Architect & maintain highly-available, multi-cluster, multi-region Kubernetes platform serving 3M+ MAU
  • Design geo-distributed traffic routing and failover strategy across regions
  • Lead platform engineering org — observability, reliability, deployment velocity
KubernetesMulti-RegionGitHub ActionsIstioTerraform

Senior Software Engineer / DevOps

Jun 2022 — Jun 2023

Webpoint Solutions, LLC · Kathmandu, Nepal

  • Deployed & maintained 20+ production applications across AWS
  • Built React / Next.js / NestJS apps end-to-end
  • Mentored 10+ interns and junior engineers
AWSNestJSNext.jsTypeScript

Software Engineer

Jan 2021 — Jul 2021

Signetic · Kathmandu, Nepal

  • Built HL7-formatted vaccination submission pipeline to Immunization Information Systems (IIS)
  • Converted HL7 streams into human-readable reports for clinical staff
Node.jsHL7Healthcare

Software Engineer / DevOps

Jul 2020 — Jul 2021

Leapfrog Technology, Inc. · Kathmandu, Nepal

  • Deployed multiple projects across AWS / Azure
  • Set up CI/CD pipelines with GitHub Actions, CircleCI, TravisCI, Azure DevOps
AWSAzureCI/CD

Associate Software Engineer

Jun 2019 — Jul 2020

Leapfrog Technology, Inc. · Kathmandu, Nepal

  • Foundations in full-stack delivery and cloud automation
Full-Stack

Front-end Developer

Dec 2017 — May 2019

Yarsha Studio Pvt. Ltd. · Kathmandu, Nepal

  • Built multiple React.js production sites
  • Engineered a real-time GPS-tracking backend over TCP sockets with D3.js visualization
React.jsD3.jsTCP Sockets

05 — case study

From git push to ten clusters in seconds.

A live walkthrough of the platform I lead — GitOps continuous delivery wired through ArgoCD Image Updater, with HashiCorp Vault, Karpenter autoscaling, and DNS-level failover keeping it reliable.

stage 01 / 08

git push origin main

● livehover to pause
tail -f /var/log/commit.log
feat(api): add /v2/inference endpoint
author: razzkumar · sha: a91f2c4

▸ fleet · 10 production clusters

us-east-1
Virginia
v1
us-west-2
Oregon
v1
eu-west-1
Ireland
v1
eu-central-1
Frankfurt
v1
ap-south-1
Mumbai
v1
ap-southeast
Singapore
v1
ap-northeast
Tokyo
v1
sa-east-1
São Paulo
v1
af-south-1
Cape Town
v1
me-south-1
Bahrain
v1
hashicorp vault

Auto-restart on env change.

Vault rotates → Reloader detects → Rolling update: new pods spin up, old pods drain.

vault-agentstakater/reloaderrolling-update

kv/prod

DB_PASSWORD

ttl 15m

deploy/api · v13/3 ready

api-1

v1 · Running

api-2

v1 · Running

api-3

v1 · Running

steady state · watching kv/prod
karpenter · ec2

Right-sized nodes. In seconds.

Provision the cheapest, fastest fit per workload.

cluster load30%

3 active nodes

dns failover · route 53

Geo-aware traffic. Zero downtime.

Health-checked failover to the secondary cluster.

api.razzkumar.com
primary · us-east-1

healthy · 142ms

secondary · eu-west-1

standby · warm

▸ how it works

01

Developer commits

git push triggers GitHub Actions on the application repo.

02

CI builds image

Multi-stage Dockerfile produces an immutable image tagged with the commit SHA.

03

Push to ECR

OIDC-auth login to AWS, image pushed to a region-replicated ECR repo.

04

Image Updater scans

ArgoCD Image Updater watches ECR and detects the new tag matching semver/regex policy.

05

GitOps bump

Updater commits a values.yaml change to the gitops repo — single source of truth.

06

ArgoCD syncs

App-of-apps detects drift, runs sync waves with PreSync hooks (db migrations, smoke checks).

07

Fleet rollout

Progressive delivery across 10 clusters — canary → 25% → 100%, with automatic rollback on SLO burn.

08

Observability loop

Prometheus + Loki feed Grafana SLO dashboards; PagerDuty fires only on error-budget burn.

05 — let’s build

Got something
ambitious?

I’m always up for a chat about platforms, reliability, AI infrastructure, and the messy space in between.