WARNING: 1.1 API DEPRECATED
RackHD serves as an abstraction layer between other M&O layers and the underlying physical hardware. Developers can use the RackHD API to create a user interface that serves as single point of access for managing hardware services regardless of the specific hardware in place.
RackHD has the ability to discover the existing hardware resources, catalog each component, and retrieve detailed telemetry information from each resource. The retrieved information can then be used to perform low-level hardware management tasks, such as BIOS configuration, OS installation, and firmware management.
RackHD sits between the other M&O layers and the underlying physical hardware devices. User interfaces at the higher M&O layers can request hardware services from RackHD. RackHD handles the details of connecting to and managing the hardware devices.
The RackHD API allows you to automate a great range of management tasks, including:
Feature | Description |
---|---|
Discovery and Cataloging | Discovers the compute, network, and storage resources and catalogs their attributes and capabilities. |
Telemetry and Genealogy | Telemetry data includes genealogical details, such as hardware, revisions, serial numbers, and date of manufacture |
Device Management | Powers devices on and off. Manages the firmware, power, OS installation, and base configuration of the resources. |
Configuration | Configures the hardware per application requirements. This can range from the BIOS configuration on compute devices to the port configurations in a network switch. |
Provisioning | Provisions a node to support the intended application workflow, for example lays down ESXi from an image repository. Reprovisions a node to support a different workload, for example changes the ESXi platform to Bare Metal CentOS. |
Firmware Management | Manages all infrastructure firmware versioning. |
Logging | Log information can be retrieved for particular elements or collated into a single timeline for multiple elements within the management neighborhood. |
Environmental Monitoring | Aggregates environmental data from hardware resources. The data to monitor is configurable and can include power information, component status, fan performance, and other information provided by the resource. |
Fault Detection | Monitors compute and storage devices for both hard and soft faults. Performs suitable responses based on pre-defined policies. |
Analytics Data | Data generated by environmental and fault monitoring can be provided to analytic tools for analysis, particularly around predictive failure. |
The primary goals of RackHD are to provide REST APIs and live data feeds to enable automated solutions for managing hardware resources. The technology and architecture are built to provide a platform agnostic solution.
The combination of these services is intended to provide a REST API based service to:
The original motive centered on maximizing the automation of firmware and BIOS updates in the data center, thereby reducing the extensive manual processes that are still required for these operations.
Existing open source solutions do an admirable job of inventory and bare OS
provisioning, but the ability to upgrade firmware is beyond the technology
stacks currently available (i.e. xCat, Cobbler, Razor or Hanlon).
By adding an event-based workflow engine that works in conjunction with classical PXE
booting, RackHD makes it possible to architect different deployment configurations
as described in how_it_works
and Deployment Environment.
RackHD extends automation beyond simple PXE booting. It can perform highly customizable tasks on machines, as is illustrated by the following sequence:
In effect, RackHD combines open source tools with a declarative, event-based workflow engine. It is similar to Razor and Hanlon in that it sets up and boots a microkernel that can perform predefined tasks. However, it extends this model by adding a remote agent that communicates with the workflow engine to dynamically determine the tasks to perform on the target machine, such as zero out disks, interrogate the PCI bus, or reset the IPMI settings through the hosts internal KCS channel.
Along with this agent-to-workflow integration, RackHD optimizes the path for interrogating and gathering data. It leverages existing Linux tools and parses outputs that are sent back and stored as free-form JSON data structures.
The workflow engine was extended to support polling via out-of-band interfaces in order to capture sensor information and other data that can be retrieved using IPMI. In RackHD these become pollers that periodically capture telemetry data from the hardware interfaces.
RackHD is focused on being the lowest level of automation that interrogates agnostic hardware and provisions machines with operating systems. The API can be used to pass in data through variables in the workflow configuration, so you can parameterize workflows. Since workflows also have access to all of the SKU information and other catalogs, they can be authored to react to that information.
The real power of RackHD, therefore, is that you can develop your own workflows and use the REST API to pass in dynamic configuration details. This allows you to execute a specific sequence of arbitrary tasks that satisfy your requirements.
When creating your initial workflows, it is recommended that you use the existing workflows in our code repository to see how different actions can be performed.
RackHD is a comparatively passive system. Workflows do not contain the complex logic for functionality that is implemented in the layers above hardware management and orchestration. For example, workflows do not provide scheduling functionality or choose which machines to allocate to particular services.
We document and expose the events around the workflow engine to be utilized, extended, and incorporated into an infrastructure management system, but we did not take RacKHD itself directly into the infrastructure layer.
Comparison to other open source technologies:
Cobbler comparison
Razor/Hanlon comparison
xCat comparison