Fullstack

Picking RTDB for the auction hot path: three reasons and three regrets

Project Sato's hot path needs sub-100ms latency, atomic bid ordering, and listener cost that doesn't blow up under load. RTDB delivers two of three; Firestore delivers none. Here's why I split them, and what the split costs.

Arthur Dutra··8 min readShare ↗RSS

The auction engine I've been building has a single hot path that has to be right: a user taps Dar Lance, the value lands at the engine, gets validated, gets ordered against any concurrent bids, and the new state broadcasts to every other connected client, all in under 100ms p99. Above that ceiling, the auction starts feeling laggy at exactly the moment it can't afford to.

I went into the design assuming Firestore would carry the whole thing. By the end of week one I'd switched to a hybrid: Firebase Realtime Database (RTDB) for live state, Firestore for everything that persists, and a Go engine on Cloud Run sitting between them and the clients. This post is the why.

What the hot path looks like

1. User taps "Dar Lance" on the app
2. App calls placeBid(lotId, amount, token)   (Cloud Function)
3. Function validates the auth token, forwards to the Go engine
4. Go engine:
   a. validates the bid (amount > current, within teto, user qualified)
   b. updates in-memory state under per-lot mutex
   c. writes to RTDB: /auctions/{eventId}/{lotId}      (live state)
   d. enqueues Cloud Task → write into Firestore        (history, async)
   e. returns accepted / rejected
5. RTDB sync: every connected client gets the new state
6. App updates UI from its RTDB listener

End-to-end target: 40–80ms. End-to-end ceiling I treat as a P0 bug: 100ms.

Why Firestore alone doesn't fit

Three reasons, in order of how badly they hurt.

Latency. Firestore transactions land in 100–300ms typical. That isn't Firestore being slow; it's the consistency model. The transaction reads the document, the client computes new state, the write goes back, the server validates the read hasn't changed. Round-trips. Every winning bid in our system would be at the wrong end of that range, and concurrent bids would retry the transaction until one wins. Lances arriving in the same 50ms window would queue serially against a hot document.

Listener cost. Firestore charges per document read. A live auction with 1,000 bidders watching a single lote and 30 bids/minute means 30,000 reads/minute per lot, just for the listeners. At a moderate event with 10 simultaneous lots that's 18 million reads/hour. The free tier is 50k reads/day; you blow past that during the first event.

Atomic ordering under contention. Firestore transactions are individually atomic, but ordering across concurrent transactions is whatever order the server resolves them. On a hot document the retry loop makes wall-clock order drift from the order users actually tapped. For a money auction, that drift is the kind of bug you don't catch in dev and your lawyer catches at trial.

Why RTDB does

Latency. RTDB sync is 10–50ms typical. The wire format is leaner, the server-side state is simpler (a tree, not a query engine), and the SDK pushes diffs over an open socket rather than re-reading documents. For the same 1,000 listeners, the time-to-screen of a new bid drops from "perceptible delay" to "felt instant."

Cost model. RTDB charges by GB transferred, not by operation. The same 1,000 listeners receiving the same 30 small bid updates cost the same as 1 listener receiving them; the diff broadcasts once per connection. Bandwidth-bound costs scale with payload size and connection count, both of which I control.

Data shape. The live state of one auction is a single shallow document: current bid, bidder UID, count, status, timer end, top bids. RTDB's tree model is a better fit than Firestore's document/collection model for this specific shape. We don't need composite indexes, range queries, or where clauses on live state. We need atomic writes to a known path and propagation to subscribers:

/auctions/{eventId}/{lotId}/
  currentBid:  45000
  bidderUid:   "abc123"
  bidCount:    23
  status:      "active"
  timerEnd:    1736...
  topBids:     [...]

That's the entire schema for one live lot. Firestore would model it as a document plus a subcollection for top bids; RTDB models it as a node and a subscriber sees the whole subtree on every change.

...but not for everything

RTDB is the wrong tool for almost everything else in the system. It can't query: no where clauses worth mentioning, no compound indexes, no orderBy on arbitrary fields. The security rules language is older and weaker than Firestore's. There's no aggregation. So historical data (users, completed events, transactions, contracts, audit logs) lives in Firestore where the query model fits.

The hot path writes to both:

RTDB (live state)                Firestore (persistent)
───────────────────              ─────────────────────────
/auctions/{event}/{lot}/         users/
  currentBid                     lots/
  bidderUid                      events/
  bidCount                       bids/             ← audit history
  status                         transactions/
  timerEnd                       contracts/
  topBids                        inspections/

The Go engine writes to RTDB synchronously (the broadcast) and fires a Cloud Task asynchronously that lands the bid in Firestore for the audit log. Clients see the RTDB update in 50ms; the immutable historical record arrives in Firestore a few hundred ms later. The two writes are eventually consistent because they describe two different things: what's happening now vs what happened.

The Go engine sitting between them

I never let clients write to RTDB directly. RTDB rules block client writes on the entire /auctions subtree; the only writer is the Go engine running on Cloud Run with min-instances=1. Three reasons:

Validation lives somewhere portable. The bid validation rules (minimum increment, teto, anti-snipe timer extension, qualified-bidder check) are non-trivial business logic. Encoding them in RTDB security rules would be painful (rules language is anemic) and risky (a missed edge case is a money bug). Putting them in a Go service means real types, real tests, real CI.

// Per-lot mutex serializes concurrent bids; atomic ordering is free.
func (e *Engine) PlaceBid(ctx context.Context, in BidIntake) (Result, error) {
    lot := e.lotMu(in.LotID); lot.Lock(); defer lot.Unlock()
 
    state := e.state[in.LotID]
    if err := validate(state, in); err != nil {
        return Result{Status: Rejected, Reason: err.Error()}, nil
    }
 
    state = state.Apply(in)        // pure function, easy to test
    e.state[in.LotID] = state
 
    if err := e.rtdb.Set(ctx, livePath(in), state.Live()); err != nil {
        return Result{}, err       // surfaces back to the client
    }
    e.tasks.Enqueue(ctx, persistBid(in))   // history, async
    return Result{Status: Accepted, NewState: state.Live()}, nil
}

Atomic ordering becomes trivial. A single Go process owns the auction state. Concurrent bids hit a per-lot mutex, get processed in arrival order, broadcast in the same order. No transaction retries, no "last writer wins" surprises.

No cold start. Cloud Functions for bid intake would fall over on the first bid of a quiet morning; 1–3s cold start is unacceptable when the user just tapped a button. Cloud Run with min-instances=1 keeps one warm instance always live. Costs ~$60/month per region; cheap insurance for the experience.

Three regrets

1. Cognitive load is real. Every developer who joins has to learn which data lives in RTDB, which in Firestore, and why. Onboarding takes longer than a single-database design would. Mitigation: a one-page table in the README, plus aggressive naming. auctionLiveRef always means RTDB, bidHistoryRef always means Firestore. The convention is doing more work than the architecture.

2. Schema migrations across two databases hurt. When the live-state shape evolves (say, adding a pendingExtensionMs field for anti-snipe) I have to migrate the RTDB nodes and coordinate with the engine deploy that starts writing the new field. Forgetting either side leaves the listener reading either stale data or undefined. There's no Prisma-style migration tool that spans both.

3. Vendor lock-in is concentrated. RTDB has no equivalent outside Firebase. If Google ever changes the pricing model (they have, twice in the last decade), I'm rewriting the live-state layer wholesale. The Go engine and Firestore data are portable; both could move to another cloud in a week. The RTDB portion is genuinely sticky. The mitigation I haven't fully implemented yet: an interface around RTDB writes so the engine could swap in a self-hosted alternative (Redis Pub/Sub, NATS) the day the day comes.

What I'd do again

The split. The split was right. It cost a moderate amount of architectural complexity and one concentrated vendor lock-in to gain an order of magnitude in latency, an order of magnitude in cost-per-listener, and atomic ordering guarantees that I trust. For a system where a wrong bid order is a wrong invoice, those three together are worth a teto on Google's pricing decisions.

If you're building anything where a hot path needs sub-100ms broadcast to many listeners, has order-sensitive writes, and a long historical tail you'll want to query later, the same shape applies. Pick the streaming database for live state. Pick the document database for history. Put a real service between them and the clients. Don't try to make one database do both.