metal-stack v0.14.0 πŸ¦”

Gerrit Schwerthelm Tue Jul 04 2023

In metal-stack v0.14.0 we have added support for SONiC, the operating system for the switches in a metal-stack managed data center.

Check out the direct link to the release here.

Introducing SONiC

Until today we have been driving the switch infrastructure of metal-stack with an operating system called Cumulus Linux. And we were very satisfied with it. However, as Cumulus Networks was acquired by Nvidia, Cumulus Linux is being discontinued for Broadcom-based switches. Hence, starting from Cumulus Linux 5, only support for Spectrum-based switches will continue, which unfortunately requires us to migrate away from the OS.

As open source is in the nature of metal-stack, the decision for the replacement fell on SONiC. SONiC was originally created by Microsoft, open sourced in 2016, and is today governed by the Linux Foundation. The OS is based on Debian and supports a wide array of devices and platforms, such that we can open up for a wider compatibility with metal-stack.

Similar to Cumulus Linux, SONiC also utilizes FRR as a routing daemon, such that it was possible to re-use our existing configurations (as applied by metal-core, a small metal-stack component running on the leaf switches).

The entire task was described in enhancement proposal MEP-10.

A big shoutout to everyone who contributed to this huge shift in technology! Key players were @mwindower, @GrigoriyMikhalkin, @robertvolkmann,@majst01, @mwennrich and @mreiger. You rock! πŸ™‚

Operations

For deploying SONiC-based switches, the metal-roles repository now contains a sonic role. As the role comes with a lot of configuration options, it is suitable for deploying leafs, spines and exit switches. Components like the metal-core will automatically be configured for SONiC when getting deployed as the Ansible playbooks will automatically detect the underlying switch OS.

After the deployment, in metalctl you will be able to see the switch OS now (with a 🐒 indicating Cumulus Linux and a πŸ¦” indicating SONiC):

❯ m switch ls
ID                     PARTITION    RACK                OS   STATUS
fel-wps101-leaf01      fel-wps101   fel-wps101-rack01   🐒    ●
fel-wps101-leaf02      fel-wps101   fel-wps101-rack01   🐒    ●
n2-tm1601-r01leaf01    n2-tm1601    n2-tm16-rack01      πŸ¦”    ●
n2-tm1601-r01leaf02    n2-tm1601    n2-tm16-rack01      πŸ¦”    ●
❯ m switch ls -o wide
ID                     PARTITION    RACK                OS                                                  IP              MODE          LAST SYNC   SYNC DURATION   LAST SYNC ERROR
fel-wps101-leaf01      fel-wps101   fel-wps101-rack01   Cumulus/3.7.15                                      10.5.253.130    operational   6s          1.149s          12d 13h ago
fel-wps101-leaf02      fel-wps101   fel-wps101-rack01   Cumulus/3.7.15                                      10.5.253.134    operational   2s          1.053s          25d 17h ago
n2-tm1601-r01leaf01    n2-tm1601    n2-tm16-rack01      SONiC/Edgecore-SONiC_20230505_014148_ec202111_386   10.11.253.130   operational   1s          316ms           10m 31s ago
n2-tm1601-r01leaf02    n2-tm1601    n2-tm16-rack01      SONiC/Edgecore-SONiC_20230505_014148_ec202111_386   10.11.253.134   operational   6s          387ms           30m 6s ago
...

Summary of Further Additions

In this release we also announce:

  • The support of Debian 12 as an OS image for Kubernetes worker nodes. Checkout metal-images for further release information.
  • A row of metalctl improvements were introduced:
    • Searching the audit traces with audit ls.
    • Creation, update and deletion through file with -f.
    • Choices between bulk print and individual print during bulk operations like apply -f including interactive security prompts.
  • Support for Gardener v1.53, including SSH key rotation on workers and the firewall.
  • Performance improvements for the metal-ccm, drastically decreasing traffic on the metal-api

More Information

This is only a small extract of what went into our v0.14.0 release.

Please check out the release notes to find a full overview over every change that went part of this release.

As always, feel free to visit our Slack channel and ask if there are any questions. πŸ˜„