Skip to main content

metal-stack v0.8.0 ๐Ÿคฉ

ยท 5 min read
Gerrit Schwerthelm
Maintainer @ x-cellent technologies GmbH

metal-stack v0.8.0 has been released and as always, there are many new things to show. ๐Ÿคฉ

Check out the direct link to the release here.

Gardener 1.19 Compatibilityโ€‹

It's a long time since we were able to update our Gardener dependencies, but finally metal-stack supports running with Gardener 1.19 including the API Server SNI feature!

Audit Loggingโ€‹

A lot of effort was put into our solution for audit logging (kudos to @mreiger ๐Ÿš€). Even though we are still trying to find more general solutions with Gardener for audit logging, we can offer the following functionalities at the moment, all "toggle-able" through our shoot spec's control plane feature gates:

  • Deployment of an audit-forwarder, streaming the API servers audit events directly into the shoot cluster
  • Configuration for sending logs directly to Splunk (this needs to be enabled through the controller configuration and can also be reconfigured from a user through a secret in the shoot cluster's kube-system namespace)

Cloud Storageโ€‹

In addition to the logging features, we are also moving forward with our duros-controller, which is an operator providing LightOS NVMe/TCP storage by Lightbits Labs to our shoot clusters. The operator is now capable of automatic API token renewal, managed resource health checking and it now also defines a proper cleanup and deletion flow.

We will present this storage solution in one of the next Gardener Community Meetings, so please check the #gardener Channel on Kubernetes' Slack Workspace and keep the date. ๐Ÿค“

Big shout out to our Israeli friends of Lightbits Labs for helping us bringing this storage solution to production-grade! โค๏ธ

Improvements on Filesystem Layoutsโ€‹

In the last minor release we introduced FSLs (file system layouts), which allows users to deploy customizable disk layouts for machine provisioning. In this release we also support software RAIDs (managed through mdadm) to be defined in an FSL. This is how a FSL definition may look like:

id: raid
description: "raid layout example"
constraints:
sizes:
- c1-xlarge-x86
images:
debian: "*"
disks:
- device: "/dev/sda"
wipeonreinstall: true
partitions:
- number: 1
label: "root"
size: 0
gpttype: "8300"
- device: "/dev/sdb"
wipeonreinstall: true
partitions:
- number: 1
label: "root"
size: 0
gpttype: "8300"
raid:
- arrayname: "/dev/md0"
level: 1
devices:
- "/dev/sda1"
- "/dev/sdb1"
createoptions: ["--metadata", "1.0"]
filesystems:
- path: "/"
device: "/dev/md0"
format: "ext4"
label: "root"
- path: "/tmp"
device: "tmpfs"
format: "tmpfs"
mountoptions:
[
"defaults",
"noatime",
"nosuid",
"nodev",
"noexec",
"mode=1777",
"size=512M",
]

The example illustrates how to put a system's root partition on a raid device /dev/md0, where a the software RAID 1 is placed on the two disks /dev/sda and /dev/sdb.

Along with the new RAID capabilities, it is also possible to extend the size of a logical volume to the rest of free space of its volume group. It works in the same way as for disk partitioning by setting the size value to 0. Here is an example:

id: ci-runner
description: ci runner layout
constraints:
images: {}
sizes:
- c1-xlarge-x86
disks:
- device: /dev/nvme0n1
partitions: []
wipeonreinstall: true
- device: /dev/nvme1n1
partitions: []
wipeonreinstall: true
logicalvolumes:
- lvmtype: striped
name: root
size: 0
volumegroup: vgroot
volumegroups:
- devices:
- /dev/nvme0n1
- /dev/nvme1n1
name: vgroot
tags: []
filesystems:
- createoptions: []
device: /dev/vgroot/root
format: ext4
label: root
mountoptions: []
path: /
- createoptions: []
device: tmpfs
format: tmpfs
label: ""
mountoptions:
[
"defaults",
"noatime",
"nosuid",
"nodev",
"noexec",
"mode=1777",
"size=512M",
]
path: /tmp

Updates on metalctlโ€‹

The metalctl machine ipmi command is now able to show the current power state of a machine, easing the life for metal-stack operators who are managing the machine fleet. By default, the power status is picked up and reported by the bmc-catcher in a five minute update interval.

In addition to that, the machine issues command output was completely refactored. It's more operator-friendly now, allowing issue filtering and showing the lock description (which is the usual way for operators to move a machine out of the available machine pool):

$ m machine issues --omit bmc-no-distinct-ipbmc-without-mac
ID POWER LOCK LOCK REASON STATUS LAST EVENT WHEN ISSUES
263d5a00-f10b-11e9-8000-3cecef408994 โ— ๐Ÿ”’ interfaces: Error set link eth... ๐Ÿ’€ PXE Booting 24d 13h - the machine is not sending events anymore (liveliness-dead)
- machine has an incomplete lifecycle (โ†ป) (incomplete-cycles)
942a5c00-a77f-11e9-8000-ac1f6bd38c5a โ— ๐Ÿ”’ CATERR CPU FEHLER ๐Ÿ’€ Planned Reboot 92d 22h - the machine is not sending events anymore (liveliness-dead)
00000000-0000-0000-0000-002590b8f968 โ— โ€‡ Phoned Home 3s - machine phones home but not allocated (failed-machine-reclaim)
2ca51200-bdfa-11e9-8000-3cecef23002c โ— โ€‡ Phoned Home 47s - machine phones home but not allocated (failed-machine-reclaim)

All issues listed through metalctl machine issues are now documented in our docs. This should make all the issues of this command more understandable and provides ways to resolve these problems adequately.

Also, there is a new command machine power cycle which enables users to hard reset a machine through API, using our go-hal abstraction layer.

More Informationโ€‹

This is only a small extract of what went into our v0.8.0 release.

Please check out the release notes to find a full overview over every change that went part of this release.

As always, feel free to visit our Slack channel and ask if there are any questions. ๐Ÿ˜„