★ 6/10 · Infra · 2026-04-28

Kubernetes v1.36: Staleness Mitigation and Observability for Controllers

Kubernetes v1.36 introduces new features designed to mitigate controller staleness and improve the observability of controller behavior. These updates allow controllers to detect when their local cache is outdated...

Kubernetes v1.36: Staleness Mitigation and Observability for Controllers

Summary

Kubernetes v1.36 introduces new features designed to mitigate controller staleness and improve the observability of controller behavior. These updates allow controllers to detect when their local cache is outdated relative to recent API server writes, preventing incorrect reconciliation actions caused by an inconsistent view of the cluster state.

Key Points

  • AtomicFIFO Feature Gate: A new AtomicFIFO feature gate in client-go enables atomic processing of batch operations, ensuring the queue remains consistent even when events arrive out of order during initial list operations.
  • Cache Introspection: The Store interface in client-go now includes the LastStoreSyncResourceVersion() function, allowing clients to determine the latest resource version seen by the controller cache.
  • Controller-Manager Updates: The DaemonSet, StatefulSet, ReplicaSet, and Job controllers in kube-controller-manager now utilize staleness mitigation by default.
  • Feature Gate Control: Staleness mitigation for specific controllers can be disabled using the StaleControllerConsistency<API type> feature gate (e.g., StaleControllerConsistencyDaemonSet).
  • ConsistencyStore Interface: A new ConsistencyStore interface for informer authors provides WroteAt, EnsureReady, and Clear methods to track and verify resource versions.
  • New Observability Metrics: New alpha metrics include stale_sync_skips_total to track skipped reconciliations and store_resource_version to monitor the latest resource version of shared informers.

Technical Details

The staleness mitigation mechanism functions by comparing the latest resource version present in the controller's cache against the resource version of the last object the controller successfully wrote to the API server. If the cache's resource version is lower than the version of the last write, the controller identifies the cache as stale and skips the reconciliation loop to avoid acting on outdated information. This prevents the "incorrect action" pattern where a controller might revert a change because it has not yet seen the update in its local cache.

For developers implementing custom informers, the ConsistencyStore interface provides the primitives necessary to implement "read-your-own-writes" semantics. The WroteAt method records the resource version of an object after a write operation, while EnsureReady checks if the cache has reached that specific version. To prevent memory leaks in the consistency store, the Clear method should be used when an object is deleted. Additionally, client-go now emits store_resource_version metrics, which include Group, Version, and Resource labels, allowing operators to compare informer state directly against the API server's state.

Impact / Why It Matters

These improvements reduce the frequency of controllers taking incorrect or delayed actions due to cache lag, increasing the overall stability of the cluster. Developers can use the new client-go primitives to build more robust controllers that inherently handle cache inconsistency.

kubernetes infrastructure observability