Home

Backend & Platform Engineering

SSE vs WebSocket Under CPU Limits:
What Actually Breaks First in Production

Protocol choices look equivalent in happy-path tests. Under cgroup pressure, event-loop contention, queue growth, and reconnection dynamics separate resilient designs from fragile ones.

Harshit Singhal· March 2026· 10 min read
01

Executive Framing

Most push benchmarks fail before the system fails: they optimize for throughput charts and miss failure mechanics. In production, the decisive signal is not how fast the service can publish at steady state, but how it degrades when scheduling contention, queue amplification, and reconnect pressure hit at the same time.

These conclusions are drawn from controlled experimental runs under CPU-capped container deployments with consistent workload generation. The focus here is on behavioral patterns under saturation rather than absolute throughput numbers.

Key insight

The most costly mistake is optimizing for maximum throughput while ignoring p95 and p99 delivery behavior during throttling and reconnect storms.

02

Experimental Setup Characteristics

  • Constrained CPU limits: deliberate cgroup throttling to force scheduler contention.
  • Controlled load phases: both steady-state and burst scenarios exercised under identical control settings.
  • Primary observability lens: latency distribution, queue depth evolution, and reconnect behavior under pressure.
  • Explicit objective: observe failure mechanics under saturation, not compare protocol feature sets.
03

Runtime Architecture and Failure Mechanics

In async Python services, push delivery is bounded by event-loop scheduling fairness and per-connection write-backpressure. Under CPU throttling, cooperative tasks lose cadence; heartbeat handlers, flush loops, and reconnect logic contend for loop time and widen tail latency.

Directionally, SSE tended to expose saturation earlier through visible queue pressure and flush jitter, while WebSocket paths were more prone to state-heavy contention once connection management and keepalive work accumulated. The operational result is a different failure shape: SSE usually bends with increasing latency, while WebSocket is more likely to step into abrupt instability if backpressure discipline is weak.

  • Connection lifecycle: setup, steady-state writes, slow-consumer detection, and teardown must remain explicit.
  • Event-loop behavior: small scheduler slips compound into p99 expansion when large connection sets share one loop.
  • Memory model: bounded queues cap resident memory; unbounded buffering converts transient bursts into sustained instability.

These effects are directional and repeatable across stacks, even when absolute numbers differ by runtime or host profile.

04

Resource Cost Analysis: CPU, Memory, Connection State

CPU and memory costs are tightly coupled through backpressure pathways. As CPU availability tightens, drain rate falls. If ingress remains unchanged, queue depth amplifies, memory pressure rises, allocator overhead increases, and tail latency degrades in a feedback loop.

  • CPU-bound phase: serialization and network write scheduling dominate.
  • Memory-bound phase: buffered messages and per-connection state dominate.
  • State overhead: protocols with heavier lifecycle bookkeeping amplify both costs at high connection counts.

In containerized environments, this transition can happen quickly because cgroup limits make backlog growth visible as OOM risk instead of a slow degradation.

05

Behavior Under Scale

Scale failures are usually phased. Systems pass through three distinct stages before full collapse.

Phase 1
p95 drift
Stable median latency. Tail begins to expand. System looks healthy in dashboards.
Phase 2
p99 expansion
Reconnect churn begins. Write timeouts appear. Median still acceptable.
Phase 3
Partial collapse
Memory instability, write timeouts, selected pod failures. Health checks may still pass.

A recurring failure narrative was pods reporting acceptable CPU utilization while p99 delivery latency expanded sharply due to event-loop scheduling contention and queue drain lag. Health checks still passed, but user-visible latency was already outside safe bounds.

Scaling caveat

Autoscaling helps only if metrics expose saturation visibility. Scaling on CPU alone is insufficient for push systems; latency-tail and queue-pressure signals must participate or scaling reacts after degradation is already user-visible.

06

Operational Risks and When Not to Use This Approach

A benchmarking strategy centered on one-way push behavior is inappropriate when product requirements are truly duplex and low-latency in both directions.

  • Do not generalize one-way benchmark conclusions to interactive bidirectional workloads.
  • Do not run protocol tests without reconnection and slow-consumer scenarios.
  • Do not rely on averages; tail-latency and drop behavior must be primary signals.
  • Do not keep queue depth unbounded to preserve nominal throughput during bursts.
07

Decision Matrix

Production ContextPrimary RiskRecommended BiasOperational Reason
One-way event fanout under tight memory limitsBuffer growth and OOMSSE with bounded queuesSimpler lifecycle and easier memory control
Interactive duplex control channelRound-trip semantics and protocol mismatchWebSocketNative bidirectional framing
Bursty traffic with strict p99 SLOScheduler jitter and backlog cascadesProtocol + explicit shedding policyTail protection matters more than peak throughput
Kubernetes with aggressive CPU limitsThrottle-induced latency amplificationScale on latency + queue depthCPU-only signals miss early degradation
08

Monitoring and SLO Implications

Push workloads need saturation-aware observability with explicit tail-latency guardrails.

Latency
  • Delivery p50, p95, p99
  • Segmented by protocol and pod
  • Tracked at client-visible boundary
Backpressure
  • Queue depth per connection
  • Enqueue drops and write timeout rate
  • Slow-consumer counts
Connection Health
  • Active sockets and reconnect rate
  • Disconnect reasons
  • Handshake failure rate
Container Pressure
  • Memory working set and OOM events
  • CPU throttled time
  • Restart loops

SLO policy should encode degradation behavior as first-class control logic: acceptable drop strategy, max reconnect churn, and escalation triggers when p99 rises with queue amplification.

09

Minimal Async Pattern for Bounded Delivery Under CPU Pressure

This pattern favors bounded memory and controlled shedding instead of unbounded buffering.

Python · FastAPI · SSE
import asyncio
import time
from collections import deque
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse

app = FastAPI()

class DeliveryState:
    def __init__(self, max_queue: int = 512):
        self.q = asyncio.Queue(maxsize=max_queue)
        self.dropped = 0
        self.delivery_ms = deque(maxlen=2048)

state = DeliveryState()

async def publish(event: str) -> None:
    if state.q.full():
        _ = state.q.get_nowait()  # Drop oldest to cap memory.
        state.dropped += 1
    state.q.put_nowait((time.perf_counter(), event))

async def stream(request: Request):
    while True:
        if await request.is_disconnected():
            break
        ts, event = await state.q.get()
        state.delivery_ms.append((time.perf_counter() - ts) * 1000)
        yield f"data: {event}

"

@app.get("/events")
async def events(request: Request):
    return StreamingResponse(
        stream(request),
        media_type="text/event-stream"
    )
10

Engineering Conclusion

Benchmarking under CPU limits is only useful when it exposes failure mechanics: tail-latency expansion, backpressure collapse, and memory instability under reconnect pressure. Systems that look efficient at median latency can still fail operationally once saturation begins.

Treat protocol choice as part of runtime control strategy. Prefer architectures that make saturation visible, keep buffers bounded, and recover predictably after pressure drops.