Resiliency in Distributed Systems - GopherConSG 2018

Published on: Saturday, 5 May 2018

Speaker: Rajeev N Bharshetty

Running distributed systems with high uptime is hard. Faults always occur in a complex distributed environment with too many moving parts. Systems need to be designed from the start to be resilient against some of the common faults in live production systems at scale such as sudden surge in traffic, bad or failed dependencies, network outages, hosts going down etc. To safe guard against these failures and potential business loss, we discuss some of the basic patterns to be followed in designing resilient distributed systems at scale such as Circuit Breakers, BulkHeads, Fallbacks, Redundancies, Metrics and Monitoring. This talk is for everyone who is interested in building highly reliable distributed systems in Go and also hate answering pagers at 3 am in the morning.

Produced by Engineers.SG