When the size of the application or the complexity of the infrastructure/cloud instances is small, it is likely that by moving the application to more capable hardware, the performance issue can be resolved temporarily. However, as the complexity and scale of the application grow, it will become unmanageable by merely adding hardware capacity. As a matter of fact, without proper planning and adequate analysis, adding hardware to resolving performance issues is just like throwing oil on fire and trying to distinguish it. Human IT managers might not adequately handle the added complexity, and it will make the case worse. In many cases, understanding the future demands upfront and prepare the resources for them will resolve out-of-resource issues and prevent unnecessary system failures that might cause lost revenues.
Jeff Dean, head of Google Brain, has been telling the community of Cloud computing since 2017 that “Machine Learning for System Management” is going to be the future for computing. Systems in the past are full of “heuristics” solutions, and “anywhere we use a heuristic in today’s code is a candidate for replacement with a learned heuristic.” The comments resonate so well with the design principles of ProphetStor’s Federator.ai, which uses AI technologies to help the systems administrator to achieve optimization in performance and cost at the same time.
Even before that, since 2012, we have been using application-awareness to guide the system management, and the digital intelligence Federator.ai created is used for Day 1 Deployments and Day 2 Operations for modern applications. We have also shown that the Federator.ai software platform is extremely effective at customer sites. The predictivity brings transparency to planning, issue identification, performance enhancement, and massive cost reduction in human resources and hardware complexity.
The Kubernetes platform provides a fertile ground for Federator.ai to show its effectiveness. The total visibility of multi-layers of operations, including application, virtualization, containerization, cloud and/or on-premise infrastructures, are now visible, and operation data can be collected by embedded monitoring software or monitoring services. By creating correlation and understand the causality of the modules/microservices, Federator.ai can leverage the data and its AI engine to generate operation plans for even very complicated computing environments.
Federator.ai is ProphetStor’s Artificial Intelligence for IT Operations (AIOps) platform, helps enterprises optimize cloud resources, maximize application performance, save significant cost, without excessive over-provisioning or under-provisioning of resources, while meeting service-level requirements of their applications.
The main features include:
Recommendation:Utilize resource usage prediction based on workload patterns to recommend the just-in-time fitted pod sizes for performance and cost optimization.
Automation:Automate Kubernetes POD scaling with intelligence.
Performance:Improve performance by more than 60% comparing to Kubernetes Native.
Planning:Continuous recommendations for optimal resource planning.
Cost Analysis:Explore resource cost and recommend fitted cloud configuration in Multicloud environments.
Federator.ai consists of multiple components and services. Federator.ai Operator is created as an Operator for easier configuring, deploying, managing, and monitoring Federatot.ai components and services. It is certified as a level 5 Operator on OperatorHub.io.
「AlamedaService」 is a Custom Resource Definition (CRD) defined by Federator.ai Operator. It is for users to specify, store and retrieve the configuration and settings of Federator.ai components by creating an 「AlamedaService」 CR.
“AlamedaScaler” is a Custom Resource Definition (CRD) defined by Federator.ai. It is for users to specify the applications to be managed by Federator.ai by creating “AlamedaScaler” CRs. Auto-scaling tool and other parameters for the applications can also be defined in the “AlamedaScaler” CR.
Federator.ai supports vertical pod autoscaling (resizing resource requests and limits of pods) and horizontal pod autoscaling (scaling the number of pods) based on intelligent workload predictions and recommendations.
Federator.ai supports OpenShift 3.11, and 4.x and Kubernetes later than 1.11
Federator.ai offers a 30-day free trial by default. Customers require to get a regular subscription keycode from ProphetStor to continue using Federator.ai after the 30-day trial period.
Frequently Asked Questions
Federator.ai provides an installation script to help customers install Federator.ai by command line:
- Log in OpenShift as cluster admin
- Download the installation script from Github and run the script
- The script will take a few minutes to complete the installation
- Confirm Federator.ai pods are running properly by “oc get pods” command or by Federator.ai GUI
For OpenShift 4.x, Federator.ai can be installed from OperatorHub.io:
- Log on OpenShift 4.x administration console (GUI)
- Go to main menu, 「Home -> Projects」, and create a new project for installing Federator.ai
- Go to main menu, 「Operators -> OperatorHub」, and select 「Federator.ai」 Operator
- Click 「Install」 and click 「Subscribe」 when OpenShift prompts 「Create Operator Subscription」
- It will take a few minutes to complete the installation
- ai Operator will show up on 「Operators -> Installed Operators」 page
Federator.ai requires to run in the OpenShift cluster which has at least 4 CPU cores and 16GB memory.
Federator.ai leverages Prometheus to collect resource metrics. It can work with the default Prometheus deployment without additional configuration in OpenShift clusters. For non-OpenShift clusters, the minimum Prometheus version required by Federator.ai is 6.4.4.
Federator.ai gets workload metrics from Prometheus and stores processed result in InfluxDB databases. If Federator.ai fails to connect to Prometheus or InfluxDB, Federator.ai pods will wait for the availability of Prometheus and InfluxDB, and stay in “Pending” status. You will need to check the health and connectivity of Prometheus and “alameda-influxdb” pod. Federator.ai pods may also fail to start due to the shortage of CPU or memory resource. You can use “oc describe pod <pod_name> -n <namespace>” to get some detailed runtime information of the pod.
Applications to be managed by Federator.ai are defined in 「AlamedaScaler」 CRs. Federator.ai discovers and identifies the application pods by using “namespace” and “label” specified in 「AlamedaScaler」 CR, 「metadata.namespace」 and 「spec.selector.matchLabels」 field. You can use 「oc get <deploy|dc|statefulset> -n <namespace> –show-labels」 to confirm the labels on application pods match the labels defined in 「AlamedaScaler」 CRs.
Federator.ai gets CPU and memory metrics from Prometheus. If Federator.ai GUI displays none of CPU and memory metrics, the cause would be Federator.ai is unable to connect to Prometheus. In OpenShift environment, you can check the health status of Prometheus in “openshift-monitoring” namespace by “oc get pods -n openshift-monitoring”, or by OpenShift administration console, “Application Console -> openshift-monitoring -> Overview -> prometheus-k8s”.
Federator.ai may take from a few seconds to a few minutes to make the predictions depending on the number of workloads to predict and the computation resources Federator.ai can have. You may first check the License Status in Federator.ai GUI, “Home -> Dashboard” page. If you are running Federator.ai version 4.2.746 or later, you can check the license state under “About” when you click on the User icon. If the License Status is “No Keycode”, “Expired” or “Invalid”, Federator.ai cannot provide the predictions and recommendations without a valid license. You will need to contact ProphetStor to get a subscription keycode and apply to Federator.ai. The second is to check the health status of Federat.ai AI engine pods, “alameda-ai-xxxx”. If they are ready and running normally, the predictions and recommendations should show up in Federator.ai GUI later.
Federator.ai offers a 30-day trial. Once the trial period expired, you will need to contact ProphetStor to get a subscription keycode. ProphetStor Support will provide the instructions of applying the subscription keycode to your Federator.ai.
In addition to persistent storage, pods may use local ephemeral storage for transient temporary operations. The life cycle of ephemeral storage is closely related to the life cycle of the pod. i.e. once the pod gets restarted, the data stored in the ephemeral storage is lost. Ephemeral storage may be used for conveniently building up an environment for demo or for a quick trial of Federator.ai. For production environment, it is always recommended to use persistent storage for Federator.ai.