Skip to main content

Architecture

Talos separates API key management into two planes.

Admin plane

The admin plane handles all key management and verification operations: key issuance, rotation, revocation, token derivation, JWKS, and verification (single and batch). It is exposed only to internal services and clients with admin credentials.

Endpoints: /v2alpha1/admin/, including /v2alpha1/admin/apiKeys:verify and /v2alpha1/admin/apiKeys:batchVerify.

For low-latency verification close to clients, deploy the commercial edge proxy as a sidecar. The proxy caches admin verify responses locally, so applications get sub-millisecond cache hits without exposing the admin plane publicly.

Data plane

The data plane handles self-service operations that credential holders perform with proof of possession of the credential itself, no admin authentication required.

Endpoints: POST /v2alpha1/apiKeys:selfRevoke

Verification flow

Client --> Verifier --> Cache (hit?) --> Database --> Response
| ^
+-- cache hit ---------------+
  1. Client sends credential to POST /v2alpha1/admin/apiKeys:verify
  2. Talos identifies the credential type (generated, imported, JWT, macaroon)
  3. For generated keys, the UUID is extracted from the token identifier
  4. For imported keys, a tenant-scoped SHA-512/256 hash is computed
  5. Database lookup (or cache hit) returns key metadata
  6. Response includes key status, owner, scopes, and metadata

Deployment topologies

TopologyEditionDescription
Single-nodeOSSOne process serves both planes
Split planesCommercialAdmin and data planes as separate deployments
Edge proxyCommercialSidecar proxy at the edge that caches admin verify responses locally

Both planes share the same database. Verification uses caching (memory or Redis) to minimize database load.

Ports

PortPurpose
4420HTTP API (default)
4422Prometheus metrics

Design philosophy

Separation of concerns

The system is divided into distinct layers:

  • Admin plane: Management operations (CRUD for keys, rotation, import, token derivation)
  • Data plane: High-throughput verification operations
  • Persistence layer: Database abstraction with pluggable drivers
  • Cache layer: Performance optimization with multiple backends

This separation allows independent scaling of components, different SLOs for different operations (admin targets <100ms p99, data plane targets <3ms p99), and clear boundaries between responsibilities.

Production-first design

  • Hard isolation between admin and data operations
  • Metrics, traces, and structured logs are emitted by default
  • Graceful degradation when the database or cache backend is unavailable
  • Zero-downtime deployments via rolling updates and stateless verification

Performance characteristics

  • Self-contained tokens (JWT/macaroon) enable stateless verification
  • HMAC-SHA256 keeps the revocation check on the order of microseconds; bcrypt would cap a single core at roughly 10 verifications per second
  • LRU caching for hot paths
  • Minimal allocations in the verification path

System architecture

Clients (CLI, SDK, HTTP)
|
v
+----------------------------------+
| HTTP Server (grpc-gateway) |
| Port: 4420 |
+----------------------------------+
|
v
+----------------------------------+
| Middleware |
| Logging, Metrics, Tracing |
+----------------------------------+
|
+-----+----------+
| |
v v
+-----------+ +-----------+
| Admin | | Data |
| Plane | | Plane |
| <100ms | | <3ms p99 |
+-----------+ +-----------+
| |
v v
+----------------------------------+
| Service Layer |
| Business logic, Validation |
+----------------------------------+
|
+-----+----------+
| |
v v
+-----------+ +-----------+
| Persist. | | Cache |
| SQLite | | Memory |
| PG/MySQL | | LRU |
| CRDB | | Redis |
+-----------+ +-----------+

All requests enter through a single HTTP server built on grpc-gateway (port 4420) and pass through middleware for logging, metrics, and tracing before being routed to the appropriate plane.

Component overview

HTTP server

The API layer uses grpc-gateway for HTTP/JSON routing with protobuf-based schemas. It serves both planes through a single port, handles CORS and compression, and exposes OpenAPI documentation.

Service layer

Business logic is split between the admin plane service (key lifecycle, import, token derivation, input validation) and the data plane verifier (token parsing, signature verification, revocation checking, cache management). The verifier is optimized for the hot path with minimal allocations.

Persistence

Database access uses sqlc-generated type-safe queries with pluggable drivers:

  • SQLite -- OSS edition, zero-config, suitable for millions of keys
  • PostgreSQL -- production workloads
  • MySQL -- production workloads
  • CockroachDB -- distributed deployments

Schema changes are managed through versioned migrations using golang-migrate.

Cache

The cache layer reduces database load on the verification path:

  • Memory LRU (OSS) -- local to each instance, configurable size limits
  • Redis (Commercial) -- distributed, supports cluster and sentinel modes
  • Hierarchical L1+L2 (Commercial) -- memory for speed, Redis for shared state

Crypto

Talos supports multiple JWT signing algorithms and a separate API key hashing mechanism:

  • JWT signing algorithms
  • Ed25519 (EdDSA) -- default, fastest signing and smallest keys
  • RSA-2048/4096 (RS256) -- legacy compatibility
  • API key hashing
  • HMAC-SHA256 -- used for API key revocation checks (<1ms with constant-time comparison)

The JWT signing algorithm is determined per JWK by its alg field, so one JWKS can contain keys for multiple signing algorithms at the same time.

Observability

Built-in instrumentation across three pillars:

  • Metrics -- Prometheus exposition on port 4422 with request latency histograms and error rate counters
  • Tracing -- OpenTelemetry with W3C Trace Context propagation, configurable sampling, OTLP and Jaeger exporters
  • Logging -- structured JSON logging via slog with correlation IDs and contextual fields

Scalability

Small (<1k RPS)

A single Talos instance handles both planes with SQLite and an in-memory LRU cache. No external dependencies required.

  • OSS edition sufficient
  • 1 CPU, 512MB RAM
  • Cost: $5-10/month

Medium (10-50k RPS)

Separate admin and data plane deployments behind a load balancer. PostgreSQL replaces SQLite for durability. Redis provides shared caching across data plane instances.

  • Commercial edition
  • Auto-scaling for data plane
  • Cost: $100-500/month

Large (200k+ RPS)

A cluster of 10-50+ stateless data plane instances with auto-scaling, backed by a distributed Redis cache and PostgreSQL with read replicas and connection pooling. Supports multi-region deployment.

  • Commercial edition
  • Regional data plane deployment
  • Cost: $1-5k/month