Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Kubernetes has become a de facto standard as the orchestration platform for automating deployment, scaling, and operations of application containers across clusters of hosts in a cloud environment. However, most application workloads running on Kubernetes are highly dynamic, and enterprises have limited knowledge about the resources (CPU, memory, etc) needed to support each of their dynamic workloads. Consequently, enterprises would often overprovision their resources to ensure application performance and service levels, or might sometimes underprovision their resources leading to application outage. This makes resource management in Kubernetes a critical and challenging problem.
Federator.ai, the flagship product of ProphetStor, is an AIenabled solution providing resource and performance optimization for applications running on Kubernetes. In this white paper, we will explain the essential ideas underlying the algorithms used in Federator.ai, based on the theoretical framework of reinforcement learning. This framework allows us to address the challenging problem of resource management for different applications and system environments with the same technical approach, not limited to Kubernetes.
Federator.ai, the flagship product of ProphetStor, is an AIenabled solution providing resource and performance optimization for applications running on Kubernetes. In this white paper, we will explain the essential ideas underlying the algorithms used in Federator.ai, based on the theoretical framework of reinforcement learning. This framework allows us to address the challenging problem of resource management for different applications and system environments with the same technical approach, not limited to Kubernetes.
Reinforcement Learning
One of the bestknown applications of reinforcement learning in recent years is the design of the computer program AlphaGo (and its subsequent versions) by Google’s DeepMind Technologies team, which plays the board game Go and beat the topranked human player in the world in 2017. By using a similar reinforcement learning framework, DeepMind’s machine learning solution has been applied to Google’s data centers and managed to reduce the amount of energy used for cooling by up to 40 percent.
As an academic subject, reinforcement learning has been studied in many disciplines under different names in the past. The key ideas can be traced back to the 1950’s when R.E. Bellman studied the problem of dynamic programming and came up with the Bellman Equation and an iterative algorithm in solving the equation and providing a solution to the problem.
As an academic subject, reinforcement learning has been studied in many disciplines under different names in the past. The key ideas can be traced back to the 1950’s when R.E. Bellman studied the problem of dynamic programming and came up with the Bellman Equation and an iterative algorithm in solving the equation and providing a solution to the problem.
The objective of reinforcement learning is to design a sequence of actions based on the observations of a system, so that as a result of the actions, the outcomes of the system are optimized (with respect to certain metrics).
The general framework is illustrated in Figure 1 above, and involves 1) an Agent, and 2) an Environment, in which at every time step t:
The general framework is illustrated in Figure 1 above, and involves 1) an Agent, and 2) an Environment, in which at every time step t:
1) Agent
a) observes the state S_{t}
b) executes an action A_{t}
c) receives a reward R_{t}
a) observes the state S_{t}
b) executes an action A_{t}
c) receives a reward R_{t}
2) Environment
a) receives an action A_{t}
b) gives out a reward R_{t+1}
c) results in a state S_{t+1}
a) receives an action A_{t}
b) gives out a reward R_{t+1}
c) results in a state S_{t+1}
For each state S_{t} with action A_{t}, the state S_{t+1} at the next time step can be a single, deterministic value, or S_{t+1 }could have multiple possible values, in which case the transition from state S_{t} _{ }to S_{t+1} with action A_{t} could be a random event with an underlying distribution defining the transitional probability.
The objective is to determine an optimal policy, i.e. a sequence of actions A_{t} that maximizes the cumulative rewards (weighted sum of R_{t} for all t), called the value function. In practice, it is sufficient to find a good enough policy that could approximate closely the maximum of the value function.
Surprisingly, many seemingly different problems can all be addressed successfully by the above reinforcement learning framework. These include many computer games, control of cooling power in data centers (as in Google’s), navigating an autonomous vehicle, and many other robotic applications. Each of these problems can be formulated mathematically as a Markov Decision Process (MDP), and there are wellknown algorithms to solve or approximate the solutions of MDPs. In order to apply reinforcement learning:
However, different formulations of the problem and different approximate algorithms will yield solutions with different efficiency (time to reach the solution) and computational complexity (computing resources needed to execute the algorithm), even if they all solve the same problem with the desired outcome. The key challenge is to formulate your problem correctly and select an efficient algorithm so that the solution to your problem reflects the desired outcome and requires the least amount of computing time and resources.
The objective is to determine an optimal policy, i.e. a sequence of actions A_{t} that maximizes the cumulative rewards (weighted sum of R_{t} for all t), called the value function. In practice, it is sufficient to find a good enough policy that could approximate closely the maximum of the value function.
Surprisingly, many seemingly different problems can all be addressed successfully by the above reinforcement learning framework. These include many computer games, control of cooling power in data centers (as in Google’s), navigating an autonomous vehicle, and many other robotic applications. Each of these problems can be formulated mathematically as a Markov Decision Process (MDP), and there are wellknown algorithms to solve or approximate the solutions of MDPs. In order to apply reinforcement learning:
 First formulate your problem by defining the possible actions A_{t} and the observable states S_{t}.
 Come up with the reward function R_{t}, so that when your designed sequence of actions (policy) maximizes the cumulative rewards, it will lead to the desired outcome.
However, different formulations of the problem and different approximate algorithms will yield solutions with different efficiency (time to reach the solution) and computational complexity (computing resources needed to execute the algorithm), even if they all solve the same problem with the desired outcome. The key challenge is to formulate your problem correctly and select an efficient algorithm so that the solution to your problem reflects the desired outcome and requires the least amount of computing time and resources.
Formulating Resource Management as Reinforcement Learning
Resource management problems in IT systems often represent difficult online decision tasks where appropriate solutions depend on understanding the workload and the system environment. These problems are difficult because the workloads are typically unknown and highly dynamic, and their impact on the system environments and related parameters are often hard to characterize.
Imagine an application running on a Kubernetes cluster with a highly fluctuating workload. The objective is to determine the right amount of resources such as CPU and memory that varies with the dynamic workload over time, and with the right level of application performance assurance.
Let us now formulate this resource management problem on Kubernetes into a reinforcement learning framework, as shown in Figure 2 below:
Imagine an application running on a Kubernetes cluster with a highly fluctuating workload. The objective is to determine the right amount of resources such as CPU and memory that varies with the dynamic workload over time, and with the right level of application performance assurance.
Let us now formulate this resource management problem on Kubernetes into a reinforcement learning framework, as shown in Figure 2 below:
The Kubernetes cluster becomes the Environment, and Federator.ai becomes the Agent. At every time step t:
 Federator.ai
 observes the workloads, latency, CPU, memory utilization etc. (state S_{t})
 allocates the amount of resources (action A_{t})
 receives a reward R_{t}. For example, R_{t} can be defined as (performance / amount of resources) = performancecost ratio. Performance can be defined as the application throughput or (negative) latency.
 Kubernetes cluster

executes the amount of resources recommended by Federator.ai (executes the action A_{t})
 R_{t+1} (reward) and S_{t }(workload and other metrics) are observed to determine the next action A_{t+1}.

The objective is to design an algorithm (the policy), which specifies the sequence of actions A_{t} such that the performancecost ratio (the reward) is maximized over the life of the application or a specified timeinterval. As discussed earlier, there are many wellstudied algorithms for solving the problem, and the challenge is to find a most efficient algorithm that determines (or approximates) the optimal solution quickly.
Kubernetes Applications and Resource Optimization
The above reinforcement learning formulation only gives a highlevel view of how Federator.ai addresses the challenging problem of resource management in Kubernetes. Many details in designing our algorithm have been left behind in the above discussion.
A key consideration in our algorithm design is the tradeoff among a) the application performance, b) the computing resources needed to execute our algorithm, c) the Kubernetes resources saved by applying our algorithm. Such tradeoff will highly depend on our choice of the observable states S_{t}, the possible actions A_{t}, and the reward R_{t}. If the observable states are too many (i.e. if the state space is too large), it would require too many CPU resources or too long to execute our algorithm. The choice of the reward function should reflect the system metrics that are to be optimized. The set of possible actions would also determine how long the algorithm would take to settle to a solution, and how close the solution is to the optimal value.
We have successfully applied reinforcement learning and have taken into consideration the above tradeoffs to come up an efficient algorithm that can optimize and autoscale the amount of CPU and memory resources for many applications running on Kubernetes, including Kafka.
Using the Kafka application as an example, given the producer rate (external traffic to the Kafka system) is unknown and fluctuating, our objective is to minimize the latency or consumer lag, by allocating the right amount of consumer pod resources. The challenge is that in Kafka, there is a delay when the amount of consumer pod resources is increased or decreased; there will be a rebalancing time, and therefore, a delay before the allocated resources take effect.
By appropriately selecting the observable states S_{t}, the allowable set of actions A_{t}, and the reward R_{t}, we have designed an effective algorithm that allows us to autoscale the resources for Kafka and general applications on Kubernetes effectively. For Kafka, our algorithm can reduce the latency for up to 90%, when compared with the default horizontal pod autoscaling (HPA) algorithm in native Kubernetes.
A key consideration in our algorithm design is the tradeoff among a) the application performance, b) the computing resources needed to execute our algorithm, c) the Kubernetes resources saved by applying our algorithm. Such tradeoff will highly depend on our choice of the observable states S_{t}, the possible actions A_{t}, and the reward R_{t}. If the observable states are too many (i.e. if the state space is too large), it would require too many CPU resources or too long to execute our algorithm. The choice of the reward function should reflect the system metrics that are to be optimized. The set of possible actions would also determine how long the algorithm would take to settle to a solution, and how close the solution is to the optimal value.
We have successfully applied reinforcement learning and have taken into consideration the above tradeoffs to come up an efficient algorithm that can optimize and autoscale the amount of CPU and memory resources for many applications running on Kubernetes, including Kafka.
Using the Kafka application as an example, given the producer rate (external traffic to the Kafka system) is unknown and fluctuating, our objective is to minimize the latency or consumer lag, by allocating the right amount of consumer pod resources. The challenge is that in Kafka, there is a delay when the amount of consumer pod resources is increased or decreased; there will be a rebalancing time, and therefore, a delay before the allocated resources take effect.
By appropriately selecting the observable states S_{t}, the allowable set of actions A_{t}, and the reward R_{t}, we have designed an effective algorithm that allows us to autoscale the resources for Kafka and general applications on Kubernetes effectively. For Kafka, our algorithm can reduce the latency for up to 90%, when compared with the default horizontal pod autoscaling (HPA) algorithm in native Kubernetes.