Monitoring the metal-stack
Overview
Logging
Logs are being collected by
Promtail and pushed
to a Loki instance running in the
control plane. Loki is deployed in
monolithic mode
and with storage type 'filesystem'. You can find all logging related
configuration parameters for the control plane in the control plane's
logging
role.
In the partitions, Promtail is deployed inside a systemd-managed Docker
container. Configuration parameters can be found in the partition's
promtail
role. Which hosts Promtail collects from can be configured via the
prometheus_promtail_targets variable.
Monitoring
For monitoring we deploy the kube-prometheus-stack and a Thanos instance in the control plane. Metrics for the control plane are supplied by
metal-metrics-exporterrethindb-exporterevent-exportergardener-metrics-exporter
To query and visualize logs, metrics and alerts we deploy several grafana dashboards to the control plane:
grafana-dashboard-alertmanagergrafana-dashboard-machine-capacitygrafana-dashboard-metal-apigrafana-dashboard-rethinkdbgrafana-dashboard-sonic-exporter
and also some gardener related dashboards:
grafana-dashboard-gardener-overviewgrafana-dashboard-shoot-clustergrafana-dashboard-shoot-customizationsgrafana-dashboard-shoot-detailsgrafana-dashboard-shoot-states
The following ServiceMonitors are also deployed:
gardener-metrics-exporteripam-dbmasterdata-apimasterdata-dbmetal-apimetal-dbrethinkdb-exportermetal-metrics-exporter
All monitoring related configuration parameters for the control plane can be found in the control plane's monitoring role.
Partition metrics are supplied by
node-exporterblackbox-exporteripmi-exportersonic-exportermetal-corefrr-exporter
and scraped by Prometheus. For each of these exporters, the target hosts can be defined by
prometheus_node_exporter_targetsprometheus_blackbox_exporter_targetsprometheus_frr_exporter_targetsprometheus_sonic_exporter_targetsprometheus_metal_core_targetsprometheus_frr_exporter_targets
Alerting
In addition to Grafana, alerts can optionally be sent to a
Slack channel. For this to work, at least a valid
monitoring_slack_api_url and a monitoring_slack_notification_channel must be
specified. For further configuration parameters refer to the
monitoring
role. Alerting rules are defined in the
rules
directory of the partition's prometheus role.