Yesterday’s email about incident management led me to find another interesting and related podcast. Reducing On-Call Engineer Burnout with a Volunteer Management Infrastructure from TopEndDevs.com is a discussion with Brian Scanlon from Intercom. It dives very deep into the way that their on-call system works.
Intercom prefers to use a single 24/7 on call engineer for the week. They can do this because 1) they pay it really well for the inconvenience, 2) they have very consistent reporting and recovery infrastructure across their whole stack and 3) they do it on a volunteer basis.
Brian also goes on to talk about how there were a couple of underpinning design features which enabled them to take this approach. Using a consistent architecture and and deliberately limiting their technology stack, gives their support staff a clearer idea of what to expect even if they are no experts in the whole stack.
If you want to read more about Intercom’s support approach you can read Brian’s blog here.
What I love about this approach is that it’s totally customer centric but not at the cost of the employee relationship. The support process is productised to ensure that those supporting the platform know what to do, and can commit to doing it and get rewarded handsomely for it.
So, why not productise your on-call service before you have a service? When you’re designing your product, you can also design how it’s should be supported and architect your service accordingly. Make your promise to the customer, early.