By Hamza in Distributed Systems — Jul 10, 2020

Kubernetes Readiness Probes in Microservices

How to use K8 liveness and readiness probes in large microservice architectures

A great deal has already been written about readiness and liveness checks and I don’t intend to cover that ground again. Rather I want to cover, very specifically, their use in a large microservices architecture.

I had also expected this would be an area that is already well covered, given the prevalence of Kubernetes in hosting these types of applications. Trawling the internet didn’t reveal a great deal specific to implementing readiness checks in a microservice world and recent experiences with some clients has shown there is a degree of misunderstanding in their use and correct implementation.

Much of what has been written already tends to focus on the how of implementing the readiness and liveness checks in your services and often in relation to the use of database, caches or other dependent infrastructure. Little tends to be said though, of what that implementation should look like when your dependencies take the form of other services. This is the area I’d like to explore a little further here.

Let’s take a fairly trivial example where Service A exposes a public REST API and in order to fulfil those requests will need to call Service B. When implementing your readiness check in service A you may be tempted to include a call to check if service B is available (let’s say you use the readiness end point). On the face of it, this seems to make sense. You require service B to be available, in order to be able to respond to requests. To accomplish this, you might think you can call one of the health check endpoints on service B.

Let me just state right now that this is a bad idea.

Containers, pods & services

To understand why, let’s take a step back and see how the readiness and liveness checks are used by Kubernetes and the sorts of things that are taken into consideration when you write one. The Kubernetes documentation states:

💡

The kubelet uses liveness probes to know when to restart a container …The kubelet uses readiness probes to know when a container is ready to start accepting traffic

The key word in that quote is container. The checks are intended for use by Kubernetes to check the readiness or liveness of a single container. Now this may seem obvious when looking at it within this context. Let’s imagine though, that service B implemented a readiness check via a REST endpoint. All well and good except now when I look at the REST API of service B as a developer looking to use it from service A; I see this nice looking endpoint called /health/ready. That right there is what provides the temptation to then call /health/ready on service B, from the readiness check of service A, to determine if we are able to actually process traffic.

This is where your problems start. The kubelet will be calling /health/ready on a single container and will use the results of that to manage that container (or pod). However, when you call /health/ready chances are you’re not calling it on a pod. What you’re actually calling is the Kubernetes service object (or maybe even the route which then calls the service) which is a load balancer across a number of pods.

Whilst this may seem a trivial distinction, what has now happened is that the /health/ready endpoint is now attempting to serve two purposes. Firstly, for Kubernetes to determine the status of the container, and secondly, for users of the service to determine if the service is available. The first of these makes sense as a pod can determine its own status and respond appropriately. The second of these does not as the readiness status of a single pod does notnecessarily correspond to the readiness status of the service overall.

💡

A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services.

Now, you could argue that given this statement from the Kubernetes documentation we can assume that any non-ready pods wouldn’t be able to respond, as they would not have any traffic routed to them. Sounds fine right?

Except that Kubernets only polls the readiness check every periodSeconds. This means that any given time when you call the /health/ready endpoint the answer you get may be different to the one that the Kubelet received when it last called it. I agree that 99% or so of the time this won’t be the case and it will be the same. Let’s think about when it will be different though; when your system is under load, under edge cases and race conditions. Precisely at the moments when you want a mechanism to help increase its reliability.

Overloading the readiness endpoint in this manner will result unpredictable behaviour at precisely the times you need stability. It’s also worth remembering that Kubernetes can restart only the container which may not result in a pod restart if you’re running multiple containers in a pod.

You still need failure logic

Now, you’d be right in saying that if you’re accessing the /health/ready endpoint via the Kubernetes service you are implicitly then verifying the status of the service. If at least one pod is available to have traffic routed to it, you will get a response. Now this is true (with the caveat noted above) but it’s still not a good idea.

Let’s ignore for a moment the overloading of the readiness check mentioned above. Assuming you used the /health/ready endpoint of service B to implement the readiness check of service A. Now if service B were to become unavailable Kubernetes would stop routing traffic to service A also, and you don’t need to worry about handling failures in responses from service B, right? Wrong.

As mentioned above the readiness probe will run every periodSeconds. If we were to, for example, set this value to 2 seconds (quite an aggressive setting), this still means that for 2 second intervals, service B could have gone down, and Kubernetes is still routing traffic to your pod in service A, and your users are expecting a response.

This means you still need to write code that behaves appropriately when the required dependencies are not available. Whether this be via retry, fail over or reduced functionality your code will still need to handle unavailability of dependent resources.

Readiness failure is a big hammer

Even if you’re willing accept failed responses during this window, your behaviour, once Kubernetes stops routing traffic to your pod, is probably still not going to be what you want.

Given service B will no longer be available, this means it won’t be available to any pods in service A. This will lead to zero available pods to route traffic to. Callers to service A will be faced with a HTTP404 (or was HTTP500, I forget) response code for any endpoint they happen to call. Even those that didn’t have a dependency on service B.

From a caller’s perspective (including your monitoring system), the service is no longer available at all. If instead you were to continue to pass the readiness probe, the caller could be given degraded functionality, or at worst, a far more useful error message pointing to the exact part of the system that isn’t available. This leads to far faster problem identification and resolution and also a more friendly API for users.

Cascading Failures

How many real systems (especially enterprise ones) in a microservice architecture contain only two microservices? Not many, I imagine (or hope perhaps?). In reality, the number of microservices within the ecosystem will be far greater and the interactions and dependencies between them will also be more complex. You may have a single service that multiple services rely on, or chains of dependencies that are three or four services long.

In situations such as these, writing a readiness check (liveness checks would be completely wrong but that’s outside the scope of this article) that checks the status of services that you depend on, will lead to potential cascade failure of your entire system, rather than provide resilience.

Let me provide a simple but, rather contrived example. Imagine an ecommerce system where we have three services to handle shipping status, current orders, and order history (I did say it was rather contrived!). A user may interact with any one of the services directly. The shipping service would be used by both the current orders service and the order history service. The order history service would be used by the current orders service (to place new orders into the history).

In this situation, let’s say you were to write the readiness checks for each service as follows (pseudo-code)

Order History Service

if (databaseIsAlive)
   return HTTP200
else
    return HTTP500

Shipping Service

if (orderHistoryClient.ready() == HTTP200)
    return HTTP200
else
    return HTTP500

Current Order Service

if (orderHistoryClient.ready() == HTTP200 && 
    shippingClient.ready() == HTTP200)
    return HTTP200
else
    return HTTP500

Given the dependencies expressed above, at first glance this might seem reasonable to you. Let’s say now, however, that we lose the database on the order history service. All pods on the order history service would fail the readiness check, leading Kubernetes to stop routing any traffic to the service.

In such a situation, ideally, you’d like your users to still be able to see the orders currently in progress and their shipping status, right? Provide the most functionality you can at any given time.

With the above readiness checks however, what would actually happen is that both the shipping service and the current orders service, would start failing their readiness checks and Kubernetes would stop routing traffic to both of them, also every call to the system would result in a HTTP404. This would result in the entire system actually becoming unavailable just due to the inability to view order history. Not really the intended outcome.

Just to compound matters, the monitoring infrastructure will also show the entire system as being down, and without someone investigating the logs, it won’t be possible to determine where in the system the problem lies. Conversely, if the readiness checks were more isolated, only the order history functionality would be lost and the monitoring system would point to exactly where the problem is.

TL; DR;

In short, don’t include calls to the readiness endpoint of other services in your own readiness endpoint as this;

overloads the meaning of the readiness endpoint leading to unexplained behaviour in edge cases and under load
increases the amount and complexity of the code you write as you still need to handle failures in service responses
is a very coarse-grained control
can lead to cascading failures of your entire system.

When writing your readiness endpoints, focus only on what the particular pod that it is running in, needs to care about. Spend your effort, instead, on writing great error handling and ensuring you assume every dependency you rely on, can fail, and will not be available at some point. Think about those edge cases and how to behave in those conditions instead and you’ll write a much more resilient system.