Skip to main content

Command Palette

Search for a command to run...

Re-exercise a classic Temporal stress test

Updated
6 min readView as Markdown
Re-exercise a classic Temporal stress test
P
Hi, I'm Phuong. I'm a funny guy who spend a lot of time around digital devices. If you are reading these lines, I'm either probably playing some video games or learning something from the internet.

Objective

Demonstrate how Tempura solves a typical Temporal scaling bottleneck (namespace overload) by horizontally scaling across multiple physical namespaces/clusters under load, avoiding the need to re-provision a cluster with higher shard counts.

Background

You've maybe already familiar with Scaling Temporal: The basics, the canonical load test exercise for Temporal. The article mentioned a common bottleneck for Temporal under heavy load is high number of state transition & request latency caused by shard lock contention. The typical resolution is increasing the History Shard count. However, the shard count cannot be changed after a cluster is built. If your namespace is overloaded, you are normally forced to migrate to a new cluster with a larger shard count or suffer degraded performance.

In this article, we win re-exercise the stress-test, but with Tempura in the middle to see how Tempura solves this gracefully by abstracting a single logical namespace to multiple physical namespaces across different clusters, Tempura allows you to add additional capacity on the fly without altering the client configuration or rebuilding your primary cluster.

Test Scenario Setup

Step 1: Initial Setup with a Single Physical Namespace

First, we start the compose stack, the stack includes:

  • 2 Temporal clusters

  • 1 Redis instance

  • 1 Prometheus and 1 Grafana instance

  • 1 Tempura instance (our routing layer)

docker-compose -f docker-compose.yml up -d 

We configure Tempura to route our target Virtual Namespace exclusively to Cluster A.

# Create the Virtual Namespace
curl -X POST http://localhost:8089/namespace/virtual/create \
  -d '{"name": "my-app-namespace"}'

# Map Cluster A's physical namespace to the Virtual Namespace
curl -X POST http://localhost:8089/namespace/physical/add \
  -d '{
      "virtual_namespace": "my-app-namespace",
      "physical_namespace": "namespace-a",
      "cluster_address": "cluster-a.example.com:7233"
  }'
Virtual namespace 'my-app-namespace' created successfully
Physical namespace 'namespace-a' added to virtual namespace 'my-app-namespace'

Step 2: Apply Load (The Overload Condition)

Using the benchmark-workers image from the Temporal load testing guide, we start applying a heavy load to the Tempura endpoint. Ensure your workers are pointed to Tempura instead of the physical Temporal clusters.

# Start Workers pointing to Tempura
docker run --network=sandbox_temporal-sandbox-network --env TEMPORAL_TASK_QUEUE=benchmark --env TEMPORAL_NAMESPACE=my-app-namespace --env TEMPORAL_GRPC_ENDPOINT=tempura.example.com:8088 ghcr.io/temporalio/benchmark-workers:main


# Start the Runner to generate workflows
# -c is the number of concurrent workflows
docker run --network=sandbox_temporal-sandbox-network --env TEMPORAL_TASK_QUEUE=benchmark --env TEMPORAL_WORKFLOW_TASK_POLLERS=16 --env TEMPORAL_ACTIVITY_TASK_POLLERS=8 --env TEMPORAL_NAMESPACE=my-app-namespace --env TEMPORAL_GRPC_ENDPOINT=tempura.example.com:8088 ghcr.io/temporalio/benchmark-workers:main runner -c 50 -t ExecuteActivity '{ "Count": 10, "Activity": "Echo", "Input": { "Message": "test" } }'

Note: Adjust the runner parallelism to generate enough load to stress Cluster A.

Step 3: Measure & Observe the Bottleneck

Observe the Prometheus metrics for Cluster A:

  • State Transitions: sum(rate(state_transition_count_count{namespace="namespace-a"}[1m]))
  • Shard lock spike:

    histogram_quantile(0.95, sum by (instance, le)(rate(lock_latency_bucket[1m])))
    

All workflows are allocated to cluster A

Expected Result: As the load increases, the StartWorkflowExecution latency on Cluster A will spike above our acceptable SLO (e.g. > 150ms). This indicates that namespace-a is overloaded (due to shard lock contention or database limits) and cannot comfortably handle the load.

Step 4: Scale Out with Tempura

Instead of rebuilding Cluster A with a higher shard count (which requires downtime and complex migrations), we seamlessly add Cluster B's physical namespace to the Tempura Virtual Namespace.

# Map Cluster B's physical namespace to the existing Virtual Namespace
curl -X POST http://localhost:8089/namespace/physical/add \
  -d '{
      "virtual_namespace": "my-app-namespace",
      "physical_namespace": "namespace-b",
      "cluster_address": "cluster-b.example.com:7233"
  }'
Physical namespace 'namespace-b' added to virtual namespace 'my-app-namespace'

Step 5: Observe Load Sharing and Recovery

Continue the benchmark-worker load tests and monitor both clusters in Grafana. After adding a new namespace to share the load, the load in cluster A are shared to cluster B

State transition count steadily back to equilibrium

What Tempura does behind the scenes:

Tempura’s proxy implementation acts as a Temporal-aware proxy layer that intercepts Temporal requests to dynamically route them based on a semantic mapping cache. When a request (like StartWorkflowExecution or a signal) arrives, the proxy decodes the protobuf payload to extract the WorkflowId and VirtualNamespace. It then queries its mapping cache using the WorkflowId to check for an existing assignment. If a mapping exists (a cache hit), it means the workflow is already running, and the proxy applies "workflow stickiness" by rewriting the request to target that exact physical cluster. If it's a cache miss (a new workflow), the proxy's resolver uses a load balancer to assign a physical namespace, stores this new WorkflowId -> PhysicalNamespace mapping in the cache, and forwards the rewritten request. This guarantees that all subsequent updates, queries, or signals for that workflow are consistently routed to the correct backend cluster without the client needing to know the physical topology.

TTL (Time-to-Live) handling in Tempura is specifically designed to manage the lifecycle of the semantic mapping cache (Redis) so it doesn't grow indefinitely, while ensuring that mappings exist exactly as long as they are needed. Some key things:

  1. Indefinite Storage for Active Workflows: When a workflow is first started and a new mapping is created (WorkflowId -> PhysicalNamespace), it is stored in Redis with no expiration. This ensures the mapping persists for the entire duration of the workflow execution, regardless of whether it takes seconds or months.

  2. Intercepting Completion: Tempura inspects the payloads of RespondWorkflowTaskCompleted requests as workers send them back to the server. It scans the list of commands within the payload looking for COMMAND_TYPE_COMPLETE_WORKFLOW_EXECUTION (which indicates the workflow has successfully finished).

  3. Dynamic TTL Application: When a workflow completion is detected, Tempura calls UpdateTTL(workflowID) in the mapping layer.

  4. Namespace Retention Synchronization: Inside UpdateTTL, Tempura maintains namespace retention information. This layer retrieves the exact retention period configured on the underlying physical Temporal namespace (e.g., 7 days, 30 days).

Expected Result:

  1. The State Transitions metric will show the workload being distributed across both Cluster A and Cluster B.

  2. Cluster B's resources are actively utilized, effectively doubling the horizontal scalability of the single logical my-app-namespace.

Conclusion

This scenario exercises Tempura's core value proposition: dynamically scaling Temporal namespaces beyond the limits of a single cluster's architecture (like fixed History Shard counts) without downtime or client-side reconfiguration.