Id previously posted a review of the O’Reilly Site Reliablity Engineering book by Betsy Beyer(Editor) and Chris Jones(Editor). I consider it an important book for any developer or admin to read as it covers googles approach to deploying and scaling their systems. The good news is google have released the book online at Site Reliability Engineering
The book is structured as a series of essays, and the best approach to reading it is to choose the chapters that look like they are most useful to you, and read them. In my case the most relevent chapters were –
The ultimate aim of an Site Reliability Engineer is that the site should be self managing, and the SRE writes the systems to achieve this.
The most important points I learnt were –
From my point of view the most important thing to add to my applications was health checks and monitoring. This was easy since most modern frameworks, Spring Boot and dropwizard, already contain health check API’s.