The SRE Doctrine

Have you read the Site Reliability Engineering book? First read it then read this post.

One disclaimer is that I'm writing this doctrine from the perspective of working in a large enterprise that is on it's own SaaS transformation. The kind of work I'm doing is driving the perspective and conclusion. I won't claim this applies to all SREs but I hope it's useful to some.

Stakeholders

For each SRE or DevOps or DevSecOps or DevXOps (SRE, from here on) in an Enterprise there are four primary stakeholders:

  • Outside customers or end users of the service
  • Inside customers (developers, engineers, testers, evangelists, etc. (developers, from here on)) building the service
  • Service owners (business, product, project)
  • SRE team

The SRE role is required to satisfy the requirements of these stakeholders.

End Users

End users demand the following from the service's SREs:

  • Availability
  • Privacy & Security
  • Stability

Availability

End users expect the service to be always available and usable.

SREs must make the service available at all times.

Privacy & Security

End users expect the service to be secure against all threats.

End users expect the service to respect and uphold their privacy.

SREs must make the service secure and ensure privacy.

Stability

End users expect the service to be stable at all times. This means the service is free of errors while performing its responsibilities and does not change so often that end users have to re-learn how to use it constantly.

SREs must keep the service stable.

Developers

Developers demand the following from the service's SREs:

  • Agility
  • Velocity

Agility

Developers expect the platform (provided by SREs) to give them agility to:

  • Add new capabilities
  • Modify existing capabilities
  • Run experiments
  • Fix instability

SREs must provide developers agility.

Velocity

Velocity is defined as the speed with which developers can build on the platform with agility.

Developers expect the platform (provided by SREs) to not restrict their velocity by arbitrary or artificial barriers.

Developers expect the platform (provided by SREs) to assist them with increasing their velocity.

SREs must provide developers velocity.

Owners

Owners demand the following from the service's SREs:

  • Cost
  • Compliance
  • Risk
  • Governance
  • Accountability

Cost

Owners expect the service to cost as little as possible to operate at scale.

SREs must reduce the cost to operate the service across all levels of scale.

Compliance

Owners expect the service to operate in compliance with various standards and regulations, such as (but not limited to) PCI, SOC2, FedRAMP, Federal Aviation Regulations, etc.

SREs must operate the service while remaining compliant with all relevant standards identified by Owners.

Risk

Owners expect the service to operate with minimal risk of various factors, such as but not limited to, security breach, data exfiltration, non-compliance, etc.

SREs must ensure they perform proper risk assessment of the entire service and develop policies and plans accordingly.

Governance

Owners expect the service to be governed properly with the right policies and procedures.

SREs must develop policies and procedures together with other stakeholders to meet the Owners' expections.

Accountability

Owners expect each stakeholder to have appropriate accountability.

SREs must develop the right framework to assess all stakeholders meet accountability standards set by the Owners.

SRE

SREs demand the following from themselves:

  • Innovation
  • Execution
  • Measurement

Innovation

SREs must constantly look for new ideas to:

  • Embrace risk
  • Reduce toil
  • Simplify
  • Avoid failures

These ideas can come from:

  • Daily work and toil
  • Resume Driven Development
  • Following examples of others

Execution

SREs must execute with discipline to deliver all aforementioned demands promptly and with integrity. Delays will cascade and must be avoided at all times.

Measurement

SREs must measure everything.

SREs must track these measurements over time.

SREs must use these measurements as feedback for all aforementioned demands.

Final Word

This doctrine is subject to be updated with time and as my own ideas, understanding, and comprehension evolve. With that said, this is a good place to start. These broad strokes can be used to guide policy, strategy, and tactics.