Logilab FOSDEM 2016 bottom bottom

After describing your infrastructure as code, reuse that to monitor it

image

Introduction

FOSDEM 2016 - Brussels, Belgium

Active supervision and monitoring with Salt, Graphite and Grafana

Arthur Lutz (Logilab) @arthurlutz @logilab

SaltStack Certified Engineer (0x1A5AAB35)

Introduction to Salt

Fast, scalable and flexible software for data center automation, from infrastructure and any cloud, to the entire application stack.

and

SaltStack delivers a dynamic infrastructure communication bus used for orchestration, remote execution, configuration management and much more.

Salt overview

image

Remote execution

image

Salt Components

Architecture

image

Demo!

Demo steps 1/3

  • deploy a pseudo-local-dns using salt mine and host files
  • deploy graphite : installation using debian packages and configure apache server
  • configure all minions to use the carbon returner
  • use the salt scheduler to run metrics such as status.loadavg and send the result to carbon

Demo steps 2/3

  • deploy grafana to explore metrics and build dashboards
  • use existing plugins : munin plugins example
  • deploy munin plugins and deactivate the munin-node (the salt-minion is enough)
  • configure the scheduler to run the execution module munin.run (or munin.run_all) and send back the results to carbon

Demo steps 3/3

  • configure the apache server
  • use the execution module apache.server_status to monitor apache
  • deploy a postgresql database server
  • deploy postgresql specific munin plugins
  • activate them using the salt scheduler
  • deploy the application between the frontal and the database

What we graph / monitor

our "house" probes (usually a dozen lines of python code)

  • backuppc : state of the backups
  • uptime : site monitoring
  • shinken with livestatus : state of the legacy supervision
  • munin : check the data is being collected
  • cyrus : IMAP quotas for our users
  • software forges (cubicweb) : number of patches to review, statistics on projects, etc.
  • salt : distance between the infrastructure et its description
  • smokeping equivalent : get ping results in a dictionnary
  • sitespeed : performance metrics for websites

What next ?

  • Improvement of the event returner
  • salt job cache improvements
  • Apply event drive infrastructure to metrics too, update metric when state changes
  • Deploy alerting system

The end