Skip to content

Add per-tenant TSDB cardinality status API endpoint#7332

Open
CharlieTLe wants to merge 3 commits intocortexproject:masterfrom
CharlieTLe:worktree-sleepy-sniffing-squid
Open

Add per-tenant TSDB cardinality status API endpoint#7332
CharlieTLe wants to merge 3 commits intocortexproject:masterfrom
CharlieTLe:worktree-sleepy-sniffing-squid

Conversation

@CharlieTLe
Copy link
Member

Summary

  • Add /api/v1/status/tsdb endpoint that returns per-tenant TSDB cardinality statistics (series count by metric name, label value counts, memory usage by label, series count by label-value pair, min/max time)
  • Implement the full stack: protobuf definitions, ingester gRPC method, distributor aggregation, HTTP handler, and API route registration
  • Add API documentation for the new endpoint
  • Add integration tests that validate the full end-to-end flow in a Docker-based Cortex cluster

Test plan

  • Unit tests for ingester TSDBStatus gRPC method
  • Unit tests for distributor TSDBStatus aggregation logic
  • Unit tests for HTTP handler with various limit parameters
  • Integration test (TestTSDBStatus) that starts a single-binary Cortex cluster, pushes series with varying cardinality, and validates correct series counts, metric name breakdowns, label stats, and limit truncation

🤖 Generated with Claude Code

@dosubot dosubot bot added the type/observability To help know what is going on inside Cortex label Mar 6, 2026
CharlieTLe and others added 3 commits March 5, 2026 18:29
Expose TSDB head cardinality statistics through a new /api/v1/status/tsdb
endpoint, enabling users to identify which metrics, labels, and label-value
pairs contribute the most series. This follows the existing UserStats
fan-out pattern: ingester calls Head.Stats(), distributor aggregates
across the replication set (dividing replicated counts by RF), and an
HTTP handler serves JSON with an optional ?limit=N parameter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
Document the new TSDB cardinality status endpoint in the HTTP API
reference, including query parameters, example request/response,
and field descriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
End-to-end test that starts a single-binary Cortex cluster, pushes
series with varying cardinality, and validates the TSDB status API
returns correct series counts, metric name breakdowns, and limit
truncation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
@CharlieTLe CharlieTLe force-pushed the worktree-sleepy-sniffing-squid branch from 1404aa1 to 2536082 Compare March 6, 2026 02:29
limit := int32(10)
if v := r.FormValue("limit"); v != "" {
if n, err := strconv.Atoi(v); err == nil && n > 0 {
limit = int32(n)

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of an integer with architecture-dependent bit size from
strconv.Atoi
to a lower bit size type int32 without an upper bound check.
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I think this worth a design proposal as it is not a simple API and we need to think about how to extend it long term. Several questions I have:

  • Is /api/v1/status/tsdb the right API to support? Or we should have a more specific API for cardinality analysis. Even though the endpoint is the same but I see you are using different response format compared to what Prometheus has https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats.
  • What fields should we support? memoryInBytesByLabelName like this might make it hard to extend to long term storage
  • Is it the right thing to expose the API via distributor? Worst case this API could impact writes and this is what I want to avoid. Exposing via Querier seems safer and we can utilize even Query Frontend caching in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL type/observability To help know what is going on inside Cortex

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants