CHAOS ENGINEERING

with

SERVICE MESH

Questions to the audience

Who knows what is a service mesh ?
Who knows what is a SLI, SLO, SLA ?
Who knows what is Chaos Engineering ?
Who already did Chaos Engineering ?

Outline

Kunernetes networking model
Service mesh: architecture and features
Demo of Istio
Chaos Engineering: concepts & origin
Demo of fault-injection
Q&A

Kubernetes networking model

1. all containers → all other containers without NAT

2. all nodes → all containers
all nodes ← all containers
without NAT

3. the IP that a container sees itself as
is the SAME
IP that others see it as

VIDEO: Kubernetes Deconstructed

What is a service mesh

What problems does it solve

Communication between services

A network for services, not bytes

How does it solve inter service communication

Traffic management
Resiliency
Security
Observability

source

What's in the code


reviews = {
  "name" : "http://reviews:9080",
  "endpoint" : "reviews",
  "children" : [ratings]
}

productpage = {
  "name" : "http://productpage:9080",
  "endpoint" : "details",
  "children" : [details, reviews]
}

source code

Traffic Management


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
  ...
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        cookie:
          regex: ^(.*?;)?(user=jason)(;.*)?$
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1

Resiliency


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v2
      retries:
        attempts: 3
        perTryTimeout: 2s

Security

namespace-level and service-level policies
mutual TLS Authentication
role-based access control (RBAC)

Observability

Metrics
Logs
Tracing

DEMO

CHAOS ENGINEERING

Having a child: Chaos Engineering for everything in your life.
— Arnaud Porterie (@icecrime) February 12, 2018

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
— principlesofchaos.org

Thoughtful, planned experiments designed to reveal the weakness in our systems.
— Kolton Andrus (cofounder and CEO of Gremlin Inc.)

Usually untested

Graceful shutdown
Health check
Cascading timeouts
Deployments (smoke test)

Type of errors

Unreachable
Delays
Timeout cascading
Circuit breaker

How to start Chaos Engineering

Set up monitoring !!!
Identify a measurable output that indicates behavior, define "steady state"
Form a hypothesis
Simulate real-world events
Disprove your hypothesis

Site Reliability Engineering

Identify weaknesses
Improve resiliency
SLI, SLO, SLA

DEMO

Resources

THANK YOU

Julien Bisconti

Github: @veggiemonk
Twitter: @veggiemonk
LinkedIn: julienbisconti

Revaljs by Hakim El Hattab / hakim.se