# Re-exercise a classic Temporal stress test

## Objective

Demonstrate how Tempura solves a typical Temporal scaling bottleneck (namespace overload) by horizontally scaling across multiple physical namespaces/clusters under load, avoiding the need to re-provision a cluster with higher shard counts.

## Background

You've maybe already familiar with [Scaling Temporal: The basics](https://temporal.io/blog/scaling-temporal-the-basics), the canonical load test exercise for Temporal. The article mentioned a common bottleneck for Temporal under heavy load is high number of state transition & request latency caused by shard lock contention. The typical resolution is increasing the History Shard count. However, the shard count **cannot be changed after a cluster is built**. If your namespace is overloaded, you are normally forced to migrate to a new cluster with a larger shard count or suffer degraded performance.

In this article, we win re-exercise the stress-test, but with Tempura in the middle to see how Tempura solves this gracefully by abstracting a single logical namespace to multiple physical namespaces across different clusters, Tempura allows you to add additional capacity on the fly without altering the client configuration or rebuilding your primary cluster.

## Test Scenario Setup

### Step 1: Initial Setup with a Single Physical Namespace

First, we start the compose [stack](https://github.com/phuongdnguyen/tempura/tree/master/sandbox), the stack includes:

*   2 Temporal clusters
    
*   1 Redis instance
    
*   1 Prometheus and 1 Grafana instance
    
*   1 Tempura instance (our routing layer)
    

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/6c80ab2b-7baa-4f56-b4b0-3f86c59b453c.png align="center")

```plaintext
docker-compose -f docker-compose.yml up -d 
```

We configure Tempura to route our target Virtual Namespace exclusively to Cluster A.

```shell
# Create the Virtual Namespace
curl -X POST http://localhost:8089/namespace/virtual/create \
  -d '{"name": "my-app-namespace"}'

# Map Cluster A's physical namespace to the Virtual Namespace
curl -X POST http://localhost:8089/namespace/physical/add \
  -d '{
      "virtual_namespace": "my-app-namespace",
      "physical_namespace": "namespace-a",
      "cluster_address": "cluster-a.example.com:7233"
  }'
```

```shell
Virtual namespace 'my-app-namespace' created successfully
Physical namespace 'namespace-a' added to virtual namespace 'my-app-namespace'
```

### Step 2: Apply Load (The Overload Condition)

Using the `benchmark-workers` image from the Temporal load testing guide, we start applying a heavy load to the Tempura endpoint. *Ensure your workers are pointed to Tempura instead of the physical Temporal clusters.*

```bash
# Start Workers pointing to Tempura
docker run --network=sandbox_temporal-sandbox-network --env TEMPORAL_TASK_QUEUE=benchmark --env TEMPORAL_NAMESPACE=my-app-namespace --env TEMPORAL_GRPC_ENDPOINT=tempura.example.com:8088 ghcr.io/temporalio/benchmark-workers:main


# Start the Runner to generate workflows
# -c is the number of concurrent workflows
docker run --network=sandbox_temporal-sandbox-network --env TEMPORAL_TASK_QUEUE=benchmark --env TEMPORAL_WORKFLOW_TASK_POLLERS=16 --env TEMPORAL_ACTIVITY_TASK_POLLERS=8 --env TEMPORAL_NAMESPACE=my-app-namespace --env TEMPORAL_GRPC_ENDPOINT=tempura.example.com:8088 ghcr.io/temporalio/benchmark-workers:main runner -c 50 -t ExecuteActivity '{ "Count": 10, "Activity": "Echo", "Input": { "Message": "test" } }'
```

*Note: Adjust the runner parallelism to generate enough load to stress Cluster A.*

### Step 3: Measure & Observe the Bottleneck

Observe the Prometheus metrics for Cluster A:

*   **State Transitions**: `sum(rate(state_transition_count_count{namespace="namespace-a"}[1m]))`
    

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/fcd0b9f4-1a1c-4f9f-afdc-b944d169e8ad.png align="center")

*   **Shard lock spike**:
    
    ```promql
    histogram_quantile(0.95, sum by (instance, le)(rate(lock_latency_bucket[1m])))
    ```
    

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/4ab55f59-ae3e-4f8a-8ca4-535931bee048.png align="center")

**All workflows are allocated to cluster A**

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/b491bae7-7168-4c9b-a008-1004f7305856.png align="center")

**Expected Result:** As the load increases, the `StartWorkflowExecution` latency on Cluster A will spike above our acceptable SLO (e.g. > 150ms). This indicates that `namespace-a` is overloaded (due to shard lock contention or database limits) and cannot comfortably handle the load.

### Step 4: Scale Out with Tempura

Instead of rebuilding Cluster A with a higher shard count (which requires downtime and complex migrations), we seamlessly add **Cluster B**'s physical namespace to the Tempura Virtual Namespace.

```bash
# Map Cluster B's physical namespace to the existing Virtual Namespace
curl -X POST http://localhost:8089/namespace/physical/add \
  -d '{
      "virtual_namespace": "my-app-namespace",
      "physical_namespace": "namespace-b",
      "cluster_address": "cluster-b.example.com:7233"
  }'
```

```plaintext
Physical namespace 'namespace-b' added to virtual namespace 'my-app-namespace'
```

### Step 5: Observe Load Sharing and Recovery

Continue the `benchmark-worker` load tests and monitor both clusters in Grafana. After adding a new namespace to share the load, the load in cluster A are shared to cluster B

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/7aeab1d9-864c-49b0-92df-bb6fb454ec90.png align="center")

**State transition count steadily back to equilibrium**

![](https://cdn.hashnode.com/uploads/covers/629c3e80e9ba973b69192640/b2aeb7ff-1d8e-4436-b161-cf9467539912.png align="center")

**What Tempura does behind the scenes:**

Tempura’s proxy implementation acts as a Temporal-aware proxy layer that intercepts Temporal requests to dynamically route them based on a semantic mapping cache. When a request (like `StartWorkflowExecution` or a signal) arrives, the proxy decodes the protobuf payload to extract the `WorkflowId` and `VirtualNamespace`. It then queries its mapping cache using the `WorkflowId` to check for an existing assignment. If a mapping exists (a cache hit), it means the workflow is already running, and the proxy applies "workflow stickiness" by rewriting the request to target that exact physical cluster. If it's a cache miss (a new workflow), the proxy's resolver uses a load balancer to assign a physical namespace, stores this new `WorkflowId -> PhysicalNamespace` mapping in the cache, and forwards the rewritten request. This guarantees that all subsequent updates, queries, or signals for that workflow are consistently routed to the correct backend cluster without the client needing to know the physical topology.

TTL (Time-to-Live) handling in Tempura is specifically designed to manage the lifecycle of the semantic mapping cache (Redis) so it doesn't grow indefinitely, while ensuring that mappings exist exactly as long as they are needed. Some key things:

1.  **Indefinite Storage for Active Workflows:** When a workflow is first started and a new mapping is created (`WorkflowId -> PhysicalNamespace`), it is stored in Redis with **no expiration**. This ensures the mapping persists for the entire duration of the workflow execution, regardless of whether it takes seconds or months.
    
2.  **Intercepting Completion:** Tempura inspects the payloads of `RespondWorkflowTaskCompleted` requests as workers send them back to the server. It scans the list of commands within the payload looking for `COMMAND_TYPE_COMPLETE_WORKFLOW_EXECUTION` (which indicates the workflow has successfully finished).
    
3.  **Dynamic TTL Application:** When a workflow completion is detected, Tempura calls `UpdateTTL(workflowID)` in the mapping layer.
    
4.  **Namespace Retention Synchronization:** Inside `UpdateTTL`, Tempura maintains namespace retention information. This layer retrieves the exact retention period configured on the underlying physical Temporal namespace (e.g., 7 days, 30 days).
    

**Expected Result:**

1.  The **State Transitions** metric will show the workload being distributed across *both* Cluster A and Cluster B.
    
2.  Cluster B's resources are actively utilized, effectively doubling the horizontal scalability of the single logical `my-app-namespace`.
    

## Conclusion

This scenario exercises Tempura's core value proposition: dynamically scaling Temporal namespaces beyond the limits of a single cluster's architecture (like fixed History Shard counts) without downtime or client-side reconfiguration.