Your address will show here +01 (414) 230 - 5550
Eliminate unplanned interruption, shorten task time, and improve overall efficiency.

Use Case Descriptions

Integration Environment

HPC cluster for data processing with over 250 nodes.

Customer / Partner Types
China National Petroleum Corporation (CNPC). Oil and gas industry, land exploration and surveying data processing.

Company and Solution Background
A Fortune 500 company, one of the largest energy companies in the world, relies heavily on the collection and analysis of geophysical data for oil exploration. Fully understanding the importance of natural resources, it focuses its mission on sustainable development and environmental protection, providing quality geophysical products and services. Powering its operations is an HPC data center with over 3000 servers, each connected to Directly Attached Storage.

Solution Architecture

Requirements and Challenges


Geophysical data is processed in parallel across all chosen servers in the data center. 100s or 1000s of nodes need to work uninterrupted. Even one node failure can result in reload of partial or all task jobs. Hardware failures, especially disks, are unavoidable at large scale, high-density clusters, due to their intensive data access during computation. To minimize disruptions by hardware failure, the company can only rely on new and abundant hardware to process the jobs. The selection criteria result in more than 30% waste in hardware utilization.

Solution


With Federator.ai® disk failure prediction, the HPC data center reliably selects qualified hardware for any jobs without delay of service deliveries. Integrating with task schedulers prevents loading jobs on risky nodes before a task starts, guaranteeing the health of the entire cluster during the job lifespan. Data center operators perform hardware maintenance in between jobs to prepare servers for coming tasks. Federator.ai® also keeps performance metrics of any hosts and disks, which can be used to track unusual performance patterns at any point in time.  

Solution Benefits

Shorten data processing time by more than 30% by eliminating task reloading

Save money by reducing redundancy, so that other nodes can be used for active production tasks

Leverage aged hardware by having accurate disk data predictions. No more swapping out aged, but healthy hardware.

Simplify hardware management and maintenance by transforming unexpected failures, into planned events.

Share