CHAOS ENGINEERING

with

SERVICE MESH

Questions to the audience

  1. Who knows what is a service mesh ?
  2. Who knows what is a SLI, SLO, SLA ?
  3. Who knows what is Chaos Engineering ?
  4. Who already did Chaos Engineering ?

Outline

  1. Kunernetes networking model
  2. Service mesh: architecture and features
  3. Demo of Istio
  4. Chaos Engineering: concepts & origin
  5. Demo of fault-injection
  6. Q&A

Kubernetes networking model

1. all containers → all other containers without NAT

2. all nodes → all containers
all nodes ← all containers
without NAT

3. the IP that a container sees itself as
is the SAME
IP that others see it as


VIDEO: Kubernetes Deconstructed

What is a service mesh

What problems does it solve


Communication between services


A network for services, not bytes

How does it solve inter service communication


  • Traffic management
  • Resiliency
  • Security
  • Observability
source

What's in the code


reviews = {
  "name" : "http://reviews:9080",
  "endpoint" : "reviews",
  "children" : [ratings]
}

productpage = {
  "name" : "http://productpage:9080",
  "endpoint" : "details",
  "children" : [details, reviews]
}
  
source code

Traffic Management


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
  ...
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        cookie:
          regex: ^(.*?;)?(user=jason)(;.*)?$
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
  

Resiliency


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v2
      retries:
        attempts: 3
        perTryTimeout: 2s
      

Security

  • namespace-level and service-level policies
  • mutual TLS Authentication
  • role-based access control (RBAC)

Observability

  • Metrics
  • Logs
  • Tracing

DEMO

CHAOS ENGINEERING

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

principlesofchaos.org

Thoughtful, planned experiments designed to reveal the weakness in our systems.

Kolton Andrus (cofounder and CEO of Gremlin Inc.)

Usually untested

  1. Graceful shutdown
  2. Health check
  3. Cascading timeouts
  4. Deployments (smoke test)

Type of errors

  • Unreachable
  • Delays
  • Timeout cascading
  • Circuit breaker

How to start Chaos Engineering

  1. Set up monitoring !!!
  2. Identify a measurable output that indicates behavior, define "steady state"
  3. Form a hypothesis
  4. Simulate real-world events
  5. Disprove your hypothesis

Site Reliability Engineering

DEMO

Resources

THANK YOU


Julien Bisconti

Github: @veggiemonk
Twitter: @veggiemonk
LinkedIn: julienbisconti


Revaljs by Hakim El Hattab / hakim.se

Runs on Kubernetes Presented by: @veggiemonk