SRE is what happens when you ask a software engineer to design an operations team. It is an implementation of the [[DevOps]] [[Paradigm|paradigm]].[^1] Site Reliability Engineering was coined by [[Benjamin Treynor]]. Before SRE, it was hard to find voices in the operations landscape. The only book I could find on the subject was a book from 2012 called [Effective Monitoring and Alerting](https://learning.oreilly.com/library/view/effective-monitoring-and/9781449333515/). In 2016, [Google published their seminal work on SRE for free online](https://sre.google/sre-book/table-of-contents/). It made its rounds on Hacker News and Reddit. It was a revelation. The book was called [Site Reliability Engineering](https://learning.oreilly.com/library/view/site-reliability-engineering/9781491929117/) and lay the groundwork and finally gave us some philosophical underpinnings on the challenges of keeping systems running. > Hope is not a strategy Google on SRE: > SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity. Later on I found [Practical Monitoring](https://learning.oreilly.com/library/view/practical-monitoring/9781491957349/) which was published in 2017. I attended the [[Monitorama]] conference in Amsterdam in 2018. Things have certainly matured. These days there is a wealth of knowledge available. ## Links - [Interview with a Site Reliability Engineer](https://www.youtube.com/watch?v=eMjbp-srV7g) (HashiCorp employee). - [SRE at Google](https://sre.google/) ## Books See [[Going on a Safari#Operations and monitoring]]. [[Seeking SRE (book)]] is very similar to [[Tribe of Hackers Blue Team (book)]]. It's a book full of interviews. ## Conferences - [SREcon](https://www.usenix.org/srecon) - [[SREcon]] - [Monitorama](https://monitorama.com) - [[Monitorama]] - [SLOconf](https://www.sloconf.com/) - [[SLOconf]] - [SLOconf YouTube playlist](https://www.youtube.com/playlist?list=PLLNq9CBV7AFwyRzICyCRKdcsAPAlG5bPu) [^1]: [LISA18 - SRE (and DevOps) at a Startup - YouTube](https://youtu.be/QejOVMgBBfI?t=247).