A Generic model to define and implement a practice

Capability Model

Assume that you have taken a charge of heading a new division to build a practice like SRE. At first place you can ask google. But, it is 2023, so you surrender the question to ChatGPT and it guided with the following steps:


You thought this response was generic and would attempt to ask a better question.


Goodness! It is really good. The challenge is every point is broad. I tried different capabilities like cloud, devops, and agile. the steps are more or less the same. I thought I can help to generalize this better. So this is the blog with a picture (which can speak 1000 words) that chatGPT can’t do😊. I will explain this with an example of implementing SRE Practice.

Starting with foundational principles

Wikipedia defines principles as “a proposition or value that is a guide for behaviour or evaluation. In law, it is a rule that has to be or usually is to be followed.” This is so important that defines our every action. For example, SRE can have the following principle. “We approach Reliability engineering as an extension of DevOps Principles in the delivery of Business Value by optimizing the value stream for speed and stability. We apply CALMS model. SRE Principles


Defining the capabilities of SRE Practice

ChatGPT can be a good start in identifying the capabilities. As this is an iterative process involving experts in whiteboarding the capabilities at a high level. As the capabilities evolve you can cluster them into a capability domain. SRE Capability Cluster

Alert: Tools can bias you in defining the capabilities. These capabilities should be independent of the tools. Later these capabilities will help you to identify the right tool and getting stuck with a tool which doesn’t support a specific capability.

Mapping Competency and Roles with the capability

Competencies are the skill required to achieve the capability. LinkedIn Job Description could be a good start. You may need to go deeper based on the domain you are working with. For instance, the competency required for Network SRE will differ from the Database SRE. Technology drives certain competencies. Building a broader scope will allow us to cross-skill resources. For example, Everything as a Code can be achieved using IaC technologies like Terraform or Ansible scripts. Alert: Most organizations struggle has the skill of a resource does not match the required capability. A good Assessment Model for capability and competency can drive adoption.

Capability Patterns, best patterns and reference guides

This process of defining patterns is a critical part that guides the practice of certain capabilities. Many organizations publish reference libraries for teams to apply and mature. To make the adoption faster, Artefacts are supported with tools specific snippets, examples, and templates.

incident response patterns

Source: A Framework for Incident Response, Assessment, and Learning - IT Revolution


Service Best Practices

Postmortem Culture

Postmortem Template

Response Pageduty