Architect // Kubernetes on Spot

Spot Prices.
On-demand Guarantees.

Architect moves running pods before a spot instance is reclaimed. No dropped connections, no code changes.

up to 90% cheaper than on-demand·~10s live migration·0 data loss·0 dropped connections

● ● ●loopholelabs · valkey on spot · 80×24
$ cat valkey-pod.yaml
metadata:
  annotations:
    architect.loopholelabs.io/managed-containers: '["valkey"]'
    architect.loopholelabs.io/scaledown-durations: '{"valkey":"10s"}'
    architect.loopholelabs.io/network-monitor: '{"valkey":"connections"}'
spec:
  runtimeClassName: runc-architect  # this + the annotations are the integration

$ valkey-cli HSET session:8f3a user ada cart "3 items"
(integer) 2

$ kubectl drain ip-10-0-1-23 --ignore-daemonsets  # simulate spot reclamation
node/ip-10-0-1-23 drained
/~\ migrate valkey -> ip-10-0-2-91 ... resumed in 9s

$ valkey-cli -h valkey HGETALL session:8f3a
1) "user"  2) "ada"
3) "cart"  4) "3 items"  # survived, never left memory

! On-demand means full price.

Check your cloud bill. The biggest on-demand line items are almost always stateful: session stores, message brokers, in-memory databases, AI agent memory.

They run on full-price instances because the state can't disappear, roughly 2–3× the cost of the same workload on spot. Lose the instance and the in-memory store goes with it, taking the rest of your system down: reconnect storms, cold caches, a slammed database.

Stability required on-demand or reserved instances. Not anymore.

SPOT INSTANCE RECLAIMED WITHOUT ARCHITECT:

  • in-memory dataLOST
  • open connectionsDROPPED
  • queued messagesGONE
  • in-flight transactionsGONE
  • cached + session stateGONE

└─► thundering herd → cold cache → database slammed

Spot savings without the headache

▓ SPOT INSTANCE RECLAIMED WITH ARCHITECT:

  • in-memory dataPRESERVED
  • open connectionsMIGRATED
  • queued messagesDELIVERED
  • in-flight transactionsCOMMITTED
  • cached + session stateINTACT

└─► checkpoint restored → connections migrated → zero downtime

Architect checkpoints the full running state (memory, open sockets, in-flight work) before the node is reclaimed, then restores it on a fresh spot instance. It resumes exactly where it left off.

Stateful apps that demanded on-demand pricing now run on spot: 2–3× cheaper, with no reconnect storms, no cold caches, no code changes.

Stateful on spot. Spot price, on-demand stability.

* Don't take our word for it. Try to break it.

A single Valkey instance on spot. No persistence, no replicas. The worst case. Write a session, a job queue, an AI agent's context, then drain the node: the same eviction a spot interruption triggers. The pod lands elsewhere, and everything's still there.

The data never left memory.

EKS cluster // spot instances

node-1ValkeyPostgres
node-2ClickHouseValkey
node-3PostgresClickHouse

Stateful pods live-migrate across nodes when spot reclaims one.

+ The same engine scales to zero.

The same checkpoint/restore that moves pods across spot nodes also hibernates idle ones in place. Requests drop to zero, but the pod stays scheduled: registered with the API, attached to its Services, PVCs still mounted.

It wakes in under 50ms, with no image pull, no init, no JVM warmup. The node stops paying for idle work and packs in more, and it runs alongside HPA.

Pay for what runs, not for what sleeps.

Node utilization

80%/~\
22%

Hibernated pods free the node to pack in more work.

The workloads driving up your reserved and on-demand bill are the stateful ones. Architect runs them on spot with no downtime and no data loss.

Requirements: Kubernetes 1.33+ ¦ EKS with AL2023 or GKE with Ubuntu ¦ 2+ nodes