Sloth - notes

[Sloth](https://github.com/slok/sloth) is a easy and simple [[Prometheus]] [[Service level objective]] generator. Sloth generates understandable, uniform and reliable Prometheus SLOs for any kind of service. Using a simple SLO spec that results in multiple metrics and [multi window multi burn](https://landing.google.com/sre/workbook/chapters/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts) alerts (see [[Error budget]] [[Burn rate|burn rates]]). - A single way (uniform) of creating SLOs across all different services and teams. - Automatic Grafana dashboard to see all your SLOs state. [20 lines of Sloth YAML](https://sloth.dev/introduction/) gives us about 200 lines of [[Prometheus]] config. Standardizes the SLO implementation across companies and teams by creating a single way of doing SLOs. The vanilla Prometheus integration with raw Prometheus rules, so, this will generate the prometheus recording and alerting rules in Standard Prometheus YAML format. Nice mermaid: https://sloth.dev/architecture/ ```mermaid stateDiagram-v2 direction LR input: 1 Sloth spec sloth: Sloth output: N prometheus rules input --> sloth sloth --> output state sloth { direction LR load: Load SLO Spec metadata: Gen Metadata rules slis: Gen SLI rules alerts: Gen Alert rules save: Out format load --> metadata load --> slis load --> alerts metadata --> save slis --> save alerts --> save } ``` ## Alerts Sloth SLO alerts use multiwindow-multiburn method. It will generate 2 types of alerts: - Critical/page: Pay attention right now. - Warning/ticket: Take into account, however is not urgent. Using 4 different alerts across different time window periods: - Critical/Page in short time window: Very very high rate of burning budget. - Critical/Page in medium time window: High rate rate of burning budget. - Warning/Ticket in medium time window: Constant normal rate of burning budget. - Warning/Ticket in long time window: Constant low rate of burning budget. Sloth doesn’t trigger/generate the alerts itself, Prometheus will by using Sloth generated alerting rules. Normally something connected with Prometheus (like alertmanager) will trigger these alerts notifications (e.g Slack, Pagerduty…). ## Unifi The metrics are extracted using unifi-poller (https://github.com/unifi-poller/unifi-poller) that gets the information from an Ubiquiti Wifi installation. https://sloth.dev/examples/default/home-wifi/ https://community.ui.com/questions/satisfaction-percentage-in-client-properties-overview/8c940637-63d0-41de-a67b-8166cdd0ed32 https://www.eduitguy.com/2020/04/16/unifi-poller-amazing/ https://nerdygeek.uk/2020/06/18/unifi-poller-an-easy-step-by-step-guide/ ## Tibber https://github.com/turbosnute/tibberinfo-influxdb ## SLO based alerting From https://sloth.dev/faq/ SLO based alerting? With SLO based alerting you will get better alerting to a regular alerting system, because: - Alerts on symptoms (SLIs), not causes. - Trigger at different levels (warning/ticket and critical/page). - Takes into account time and quantity, this is: speed of errors and number of errors on specific time. The result of these is: - Correct time to trigger alerts (important == fast, not so important == slow). - Reduce alert fatigue. - Reduce false positives and negatives.