Database Objects Fields and Attributes

Various ClusterWare database objects (e.g., nodes, boot configurations, image configurations, administrators, attributes) each carry with them detailed descriptors called fields. Each field consists of a name-value pair and is relevant for its database object type. Fields are predefined by ClusterWare. The cluster administrator uses the update action to change a field value.

For instance, a compute node object for each node has fields mac with the node's MAC address, name with the node's alphanumeric name, and power_uri with a value denoting how to communicate via ipmi to that node. For example, the command scyld-nodectl -i n0 ls -l displays all the defined fields' name-value pairs for node n0.

Compute node and Attribute Groups object types have special fields called attributes, where an attribute is a collection of one or more attribute name-value pairs. Attribute names that begin with an underscore "_" are called reserved attributes or system attributes. The cluster administrator uses the set action to change an attribute value. See the following section Reserved Attributes for details.

Additional attributes can be added by a cluster administrator as desired, each with a custom name and value defined by the administrator. Any script on a compute node can access the local file /etc/clusterware/attributes.ini and find that node's attributes. On the node there are helper functions in /opt/scyld/clusterware-node/functions.sh for reading attributes, specifically the function attribute_value.

Reserved Attributes

Within the ClusterWare attribute system, administrators are encouraged to store whatever information they find useful for labeling and customizing nodes. For ease of use, attributes names should be valid Javascript variable names, i.e., meaning that they may begin with any uppercase or lowercase letter, followed by letters, digits, or underscores. Names that start with an underscore are used by ClusterWare and should be set by administrators to affect the behavior of the system. These will be referred to as system attributes throughout this discussion.

Attributes are stored internally as a Javascript dictionary mapping strings to strings, otherwise known as name-value pairs. Administrator-defined attribute values should be strings and relatively small in size. The ClusterWare backend database enforces some document size constraints, and collections of node attributes should be no more than tens to hundreds of kilobytes in size. Individual attributes can be any length as long as the overall attribute group or node object size does not exceed this limits. Generally, if a cluster configuration is approaching these sizes, a cluster administrator pursue moving data from the database into shared storage locations referenced by database entries.

Attributes can be applied directly to nodes, but may also be collected into groups, and then these groups applied to sets of nodes. Attributes passed to nodes through groups are treated no differently than those applied directly to a node. Attribute groups help cluster administrators create more scalable and manageable configurations. See Node Attributes for more details.

The remainder of this section is a list of system attributes describing their use and allowed values.

_boot_config

Default: none

Values: boot configuration identifier

Depends: none

The _boot_config attribute defines what boot configuration a given node should should use. For a detailed discussion of boot configurations and other database objects, please see Node Images and Boot Configurations.

A boot configuration identifier may be a, possibly truncated, UID or a boot configuration name.

_boot_rw_layer

Default: overlayfs

Values: overlayfs, rwtab

Depends: _boot_style == roram or iscsi

Use _boot_rw_layer to control the type of overlay used to provide read/write access to an otherwise read-only root file system image. The overlayfs provides a writable overlay across the entire file system, while the rwtab approach only allows write access to the locations defined in /etc/rwtab or /etc/rwtab.d in the node image.

Note that prior to kernel version 4.9, overlayfs does not support SELinux extended attributes and so cannot be used for compute nodes with SELinux in enforcing mode. The rwtab option does work with SElinux, but two additional changes need to be made when enabling rwtab. First, the cluster administrator must modify the /etc/sysconfig/readonly-root file in the node image to ensure READONLY is set to "yes":

READONLY=yes

Second, the kernel cmdline in the appropriate boot configuration must include "ro":

cmdline: enforcing=1 ro

_boot_style

Default: rwram

Values: rwram, roram, iscsi, disked, next, sanboot, live

Depends: none

Root file system images can be supplied to nodes through a variety of mechanisms, and this can be controlled on a per-node basis through the _boot_style attribute. In both the rwram and roram modes, the node will download the entire image into RAM and either unpack it into a tmpfs RAM file system (rwram) or apply a writable overlay (roram). These boot styles have the advantage of post-boot independence from the head node, meaning that the loss of a head node will not directly impact booted compute nodes.

The iscsi option uses less RAM as the boot image is not downloaded into node RAM, but depends on the head node even after the node is fully booted. Due to this dependence a head node crash may cause attached compute nodes to hang and lose work. This approach requires a writable overlay, as the images may be shared between multiple nodes.

With the disked option, the node boots with images read from local storage. See Appendix: Booting From Local Storage Cache for details.

Use the next option to exit the boot loader and allow the BIOS to try the next device in the BIOS boot order. Since this process depends on support in the BIOS, it may not work on every server model.

The sanboot option causes the booting node to boot using the iPXE sanboot command and defaults to booting the first hard disk. Please see the _ipxe_sanboot attribute for more details.

The live option only works for ISO-based configurations, e.g., those used for kickstart. For supported ISOs (e.g., RHEL-based) the node boots into the live installer, and the administrator needs to interact with it via the (likely graphical) system console.

_boot_tmpfs_size

Default: half of RAM

Values: 1g, 2g, etc.

Depends: _boot_style == rwram or _boot_rw_layer == overlayfs

During the node boot process, a tmpfs is used to provide a writable area for diskless compute nodes. For the rwram boot style this attribute controls the size of the root file system where the image is unpacked. When booting with overlayfs on a roram or iscsi style, this attribute controls the size of the writable overlay.

_coreos_ignition_url

Default: none

Values: The URL of a RHCOS *.ign ignition file.

Depends: none

Both _coreos_ignition_url and _coreos_install_dev are attributes that must be set to fill in variables in the associated boot config's cmdline. See Using RHCOS.

_coreos_install_dev

Default: none

Values: The device on the target node into which the image is installed.

Depends: none

Both _coreos_ignition_url and _coreos_install_dev are attributes that must be set to fill in variables in the associated boot config's cmdline. See Using RHCOS.

_disk_cache

Default: none

Values: local partition

Depends: none

The _disk_cache attribute identifies a persistent location where the node can store downloaded images. This location should be a local partition with sufficient size to hold a handful of compressed images. If the specified location exists, a compute node will keep a copy of the downloaded image, and during later boots will compare the checksum of that file with the expected checksum provided by the head node to avoid unnecessary downloads.

If the named partition does not exist, then an error will be logged, and the node will download the image to RAM and still boot. If the partition exists but cannot be mounted then it will be reformatted. A luks encryption key can be provided by appending a colon and the key to the partition name.

If a cache is present but no _disk_root is provided and a roram compatible image is downloaded, then the node will boot directly from the cached image with a writable overlay.

Important

Any data in the partition specified as a _disk_cache may be destroyed at boot time!

Similar to /etc/fstab, partitions can be identified by device path, UUID, PARTLABEL, or PARTUUID.

_disk_root

Default: none

Values: local partition

Depends: ignored unless _boot_style == disked

During node booting, the root image will be unpacked into the partition named in the _disk_root attribute. This process will delete the contents of the named partition before unpacking the disk image. If the named partition does not exist, then an error will be logged, although the node will still boot using the image unpacked into RAM instead of on disk.

Append a comma and the word "encrypt" to encrypt the partition with a random key every boot to fulfill "encryption at rest" requirements. Encryption is performed using standard LUKS tools with 1MB of data from /dev/urandom stored in a key file used as the passphrase. This key file is only briefly stored in RAM and deleted shortly before an Ext4 file system is created on the newly encrypted partition. Note that the cryptsetup tool must be installed in the image that is used to create the boot configuration.

Important

All data in the partition specified as a _disk_root will be destroyed at boot time!

Similar to /etc/fstab, partitions can be identified by device path, UUID, PARTLABEL, or PARTUUID.

_disk_wipe

Default: none

Values: comma-separated list of local partitions

Depends: none

The listed partitions will be reformatted with an Ext4 file system at every boot. Similar to _disk_root, append :encrypt to the partition to enable "encryption at rest". The encryption key can also be specified by including it after the "encrypt" keyword as in :encrypt=Penguin.

_gateways

Default: The default gateway for the node's interfaces

Values: <ifname>=IPaddress

Depends: None

Override the interface ifname's current gateway value with an alternative IP address. For example, _gateways=enpls0f0=10.20.30.40,enpls0f1=10.20.40.40.

_health

Default: none

Values: node health status

Depends: none

Cluster administrators commonly write node health checks and can use the _health attribute to relay the results back to the head nodes. Whenever the health check starts running on the node, it should execute set-node-attribs _health=checking. At completion of the health check the results should be stored in the _health attribute using the same mechanism. The special value ok will be interpreted as success, while any other value is recorded as failure. Cluster administrators can check the current status using scyld-nodectl:

scyld-nodectl status --health [--refresh]

Instead of reporting the node up / down / booting status, that command will report whether the health checks have reported ok / bad / checking. Adding -l will show the failure reason for 'bad' nodes.

_hostname

Default: none

Values: Hostname or fully-qualified domain name

Depends: none

Booting compute nodes will assign the value of _hostname as their hostname using the hostnamectl command. If the attribute value is a simple name (without periods), then the cluster domain will be appended to construct a FQDN. Changes to this variable take effect during the next reboot.

_hosts

Default: blank

Values: download

Depends: none

During the compute node boot process, a list of known hosts is downloaded from the head node and is appended to the compute node's /etc/hosts. By default this will only append a list of head nodes to ensure that each compute node can resolve all head nodes without DNS. If the _hosts attribute is set to 'download', then all compute node names and IP addresses will be appended to /etc/hosts.

_ips

Default: none

Values: comma-separated IP assignments

Depends: none

Compute nodes commonly define additional high-speed network interfaces other than the PXE boot network. These interfaces are commonly defined by ifcfg-XXX files located in /etc/sysconfig/network-scripts and differ between nodes only in the assigned IP address. Use the _ips attribute to specify what IP address should be assigned to an individual node on one or more interfaces. For example, a value of _ips=eno0=10.10.23.12,ib0=192.168.24.12 would cause the prenet/write_ifcfg.sh startup script to replace any IPADDR= line in /etc/sysconfig/network-scripts/ifcfg-ib0 with IPADDR=192.168.24.12 and would similarly modify the adjacent ifcfg-en0 file, replacing any IP assignment in that file with IPADDR=10.10.23.12.

_ipxe_sanboot

Default: none

Values: local disk or partition

Depends: _boot_style == sanboot

Use this attribute to cause a node to boot using the iPXE sanboot command. This is most commonly used to boot a locally installed disk, although administrators are cautioned to be extremely careful with stateful compute nodes as they will retain modifications from previous boots, leading to an unexpectedly heterogeneous cluster.

Nodes with this attribute set will not download an image from the head node and will instead boot based on the URL or other iPXE sanboot arguments provided. Please see the iPXE documentation for the details of what iPXE provides: http://ipxe.org/cmd/sanboot

In addition to the arguments and URLs supported by iPXE, ClusterWare also accepts a shorter URL for booting local disks of the form local://0xHH where 'HH' is a hexadecimal value specifying a local hard disk. The first disk is identified as 0x80, the second is 0x81, and so on. The provided hexadecimal value is then used in a sanboot --no-describe --drive 0xHH call.

_macs

Default: The default MAC address for each of the node's interfaces

Values: <ifname>=<MACaddress>

Depends: None

Override the interface ifname's current MAC address with an alternative value. For example, _macs=bond0=aa:bb:cc:dd:ee:ff. Generally only used for bonded interfaces. Ignored for the booting interface bootnet.

_no_boot

Default: false

Values: boolean equivalents (0 / 1, true / false, t / f, yes / no, y / n)

Depends: none

The _no_boot attribute controls whether information about a node is provided to the DHCP server. Any node with _no_boot set to true will not receive DHCP offers from any ClusterWare head node. This allows an administrator to temporarily remove a node from the cluster.

_preferred_head

Default: none

Values: head node UID

Depends: none

In a multihead configuration any head node can provide boot files to any compute node in the system. In most cases this is a desirable feature because the failure of any given head node will not cause any specific set of compute nodes to fail to boot. In some cases the cluster administrator may want to specify a preference of which head node should handle a given compute node. By setting a compute node's _preferred_head attribute to a specific head node's UID, all head nodes will know to point that node toward the preferred head node. This is implemented during the boot process when the iPXE script is generated and passed to the compute node. This means that any head node can still supply DHCP, the iPXE binaries, and the iPXE boot script, but the subsequent kernel, initramfs, and root file system files will be provided by the preferred head node, and thereafter the node's boot status information will be sent to that _preferred_head.

_remote_pass

Default: none

Values: node account password for _remote_user attribute

Depends: none

Supports an alternative to the customary ClusterWare ssh-key functionality. It is useful to support scyld-nodectl exec to non-ClusterWare compute nodes which do not have clusterware-node installed, but which do accept user/password authentication.

To use, install the sshpass RPM on the head node. Set the _remote_pass attribute to the password of the _remote_user attribute user name (default root). Subsequent executions of scyld-nodectl exec to nodes that are set up with this attribute will employ this user/password pair to authenticate access to those target node(s).

Note

Use of sshpass is discouraged and is not a best practice. A clear text password is a significant security risk.

_remote_user

Default: root

Values: node account name

Depends: none

The _remote_user attribute controls what account is used on the compute node when executing the scyld-nodectl reboot/shutdown commands. Please ensure the specified account can execute sudo shutdown without a password or soft power control will not work. Similarly the scyld-nodectl exec and scyld-nodectl ssh commands will also use the specified remote user account and the boot-time script that downloads head node keys will store those keys in the _remote_user's authorized_keys file.

_status_hardware_secs

Default: 300

Values: seconds between checking for status hardware changes

Depends: none

A node sends its hardware state (viewed with scyld-nodectl list --long and list --long-long) as a component of its larger basic status information. See _status_secs above. This hardware component is typically only sent once at boot time. However, the node periodically reevaluates its hardware state every _status_hardware_secs seconds, and in the rare event that something has changed since it last communicated its hardware state to its parent head node, then the node includes the updated hardware information in its next periodic basic status message.

Changes to this value are communicated to an up node without needing to reboot the node.

_status_packages_hash_secs

Default: 120

Values: seconds between checking for installed packages changes

Depends: none

A node sends its packages_hash state (viewed with scyld-nodectl status --long-long) as a component of its larger basic status information. See _status_secs above. The packages_hash component is a hash of the sorted list of versioned package names installed on the node, distilled into a single numeric value. Since this computation is relatively expensive, and since changes to the node's installed packages are relatively rare, the node recalculates this hash value every _status_packages_hash_secs seconds, which is typically less frequent than the _status_secs interval.

Changes to this value are communicated to an up node without needing to reboot the node.

_status_secs

Default: 10

Values: seconds between status updates

Depends: none

Booted compute nodes periodically send basic status information to their parent head node. This value controls how often these messages are sent. Although the messages are relatively small, clusters with more compute nodes per head node will want to set this to a longer period to reduce load on the compute nodes.

Changes to this value are communicated to an up node without needing to reboot the node.