You ship a small change to example.com. You type git push. Ninety seconds later it’s serving real traffic — and you didn’t run a single deploy command. This is what a properly wired GitOps pipeline feels like, and the moving parts are simpler than the marketing makes them sound.
stage 01 / 06
git push origin main
You commit a change to the example.com app repo and push to main. That's the entire human action in this whole flow — everything after this is automatic.
The animation above is the entire flow. Six stages, one human action. If you’ve shipped Docker images to a Kubernetes cluster before, most of this will look familiar — git push triggers CI, CI builds and pushes an image, something pulls that image and rolls the cluster. The interesting bits live in the three sections below: why there are two repos, what sync waves actually do, and what happens when each stage breaks. The flow is the easy part. The discipline around it is the part teams skip and regret.
Why two repos
Look at stages 4 and 5 of the pipeline above. Stage 4 is the ArgoCD Image Updater spotting a new image in ECR. Stage 5 is that same Image Updater committing a single line of YAML to a different repo — your GitOps repo — bumping image.tag from a91f2c4 to whatever the new SHA is. Most of the surprise people have when they first set this up is at stage 5: the Image Updater does not deploy. It writes to git.
That separation is the whole point. The app repo (github.com/example/web) is where humans write product code. CI lives there: it builds, tests, and pushes an image to ECR. Once the push completes, the app repo’s job is done. The cluster has no idea anything happened.
The GitOps repo (github.com/example/gitops) is where the desired state of the cluster lives. Every deployment, every service, every config map, every image tag — it’s all declared as YAML in one place. ArgoCD watches this repo and only this repo. If a line in the GitOps repo says image.tag: a91f2c4, then a91f2c4 is what runs in production, full stop.
The reason to keep them separate is that they answer different questions. What does the app do? lives in the app repo, alongside the source. What’s running in production right now? lives in the GitOps repo, as a manifest. When you mash them together — the way teams do when they put their Helm charts inside the app repo and let CI kubectl apply — you lose the answer to the second question. There’s no canonical place to read off cluster state. There’s only “whatever was in HEAD when CI last ran, plus whatever has been hand-edited since.” That’s not GitOps. That’s just kubectl with extra steps.
The Image Updater is the bridge between the two: it’s the one piece of automation allowed to write to the GitOps repo on your behalf, and it commits as argocd-bot so the audit trail is honest. A reviewer can git log the GitOps repo and see every change that ever shipped, who (or what) made it, and when.
Sync waves and PreSync hooks
Stage 6 of the pipeline — ArgoCD syncs the cluster to git — is the one that hides the most behind a single label. When ArgoCD detects drift between the GitOps repo and the cluster, it doesn’t just kubectl apply the diff in some arbitrary order. It runs sync waves: integer-numbered phases that resources opt into via an annotation.
A typical web service uses three waves. Wave 0 is for things that have to happen before the new pods exist — the canonical example is a database migration. You declare the migration as a Kubernetes Job and annotate it with argocd.argoproj.io/sync-wave: "0" and argocd.argoproj.io/hook: PreSync. ArgoCD runs that Job to completion before it touches anything in wave 1. Wave 1 is the application itself: the Deployment with the new image tag, plus any Services and ConfigMaps. Wave 2 is for verification — a smoke-test Job, or a synthetic check that hits a known endpoint and exits non-zero if anything’s wrong.
The reason this matters: if your migration fails, ArgoCD stops. The new pods never get created. The old pods keep serving traffic on the old schema, and your error budget is intact. The alternative — kubectl apply everything at once and hope — is how you end up with new pods crash-looping against an un-migrated database while the old pods get terminated on rollout. You don’t want to be the one debugging that on a Friday afternoon.
Sync waves are also the place where ArgoCD distinguishes Sync from Healthy. A resource is Synced the moment ArgoCD applies its manifest. It’s Healthy when its readiness probes pass and Kubernetes reports it ready. ArgoCD will not promote to the next wave until the current wave is Healthy, not just Synced. So your wave 1 Deployment has to actually come up before the wave 2 smoke test runs. This is the mechanism that turns “I shipped the change” into “the change is actually working.”
What breaks, and how it heals
Now the part the home-page case study elides. Things go wrong. Here are the three failure modes you’ll see most often, and what the system does about each.
ArgoCD sync fails partway through. The Deployment manifest applies, but the new pods crash-loop because the image is broken. ArgoCD marks the Application as OutOfSync/Degraded and stops. It does not roll back automatically by default — and that’s intentional. The desired state in git still says “ship a91f2c4,” and ArgoCD’s job is to honor desired state, not invent a different one. Your move is either to revert the GitOps-repo commit (which makes a91f2c4 no longer desired, so ArgoCD rolls forward to the previous tag) or to push a fix to the app repo and let the pipeline run again. The audit trail stays clean: every change to the cluster is a commit, including the rollback.
Image Updater can’t push to the GitOps repo. Maybe the bot’s GitHub token expired, maybe the GitOps repo’s branch protection blocks unsigned commits, maybe the network had a bad five minutes. Image Updater logs the failure and retries on its next polling interval (default 2 minutes). The new image sits in ECR, tagged but un-shipped. This is exactly the right failure mode: nothing changes in production until the bot can write to git, and the moment it can, the existing image catches up. Page on prolonged Image Updater errors, not on individual retries.
The new image is broken in a way readiness probes don’t catch. The deploy succeeds — pods are Healthy, sync waves complete, ArgoCD is happy — but real traffic finds a regression: a 500 spike, p99 latency doubles, error budget burns. Progressive delivery is what catches this: the canary cluster (or canary pod set) takes 5% of traffic for a minute, an SLO-burn alert fires, and the rollout halts before the bad image reaches the rest of the fleet. From there it’s the same git-revert dance — undo the tag bump in the GitOps repo, ArgoCD rolls forward to the previous good tag.
The pattern across all three: the cluster always reflects what’s in git. There’s no separate “rollback button” because git itself is the rollback mechanism. Every recovery is a revert. Every fix is a forward push. That’s a property worth keeping.
What to take with you
The flow is not the interesting part. The flow is six boxes and six arrows, and once you’ve wired it, you forget about it — exactly as it should be. The discipline around it is what turns this from a pipeline into a deployment system. Two repos, because the questions they answer are different. Sync waves with PreSync hooks, because order matters and the system should know it. Failure modes that route through git, because the audit trail is the rollback.
If you want the multi-cluster, ten-region, progressive-delivery version of this — Vault rotations, Karpenter autoscaling, DNS-level failover — there’s a live walkthrough on the home page. The shape is the same. The pieces just multiply.