Automatically migrated from Gitolite
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

54 lines
2.2 KiB

8 years ago
8 years ago
  1. * allow comments in (parentheses) in units, and ignore these when matching against an alarm pattern...
  2. * web interface (angularjs)
  3. * separate alarm and IRC logic
  4. * monitor inodes
  5. * watchdog on slave and master -> should send WARN notifications
  6. * notifications (text, arbitrary-serialized-data as attachment, DEBUG/INFO/WARN/ERR/CRIT)
  7. * consider redundancy - can already connect multiple masters through pubsub, how to deal with duplicate processing checking?
  8. cprocessd:
  9. -> subscribe to ccollectd
  10. -> debug switch for outputting all to terminal
  11. -> keep up/down state
  12. -> keep last-value state (resource usage)
  13. -> keep track of persistent downtimes (down for more than X time, as configured in config file)
  14. -> alarms (move this from the IRC bot to cprocessd)
  15. -> classify message importance
  16. -> cprocessd-stream socket, PUB that just streams processed data
  17. -> cprocessd-query socket, REP that responds to queries
  18. -> server-status
  19. -> down-list
  20. -> last-value
  21. -> server-list
  22. -> service-list
  23. cmaild:
  24. -> use marrow.mailer
  25. -> receives data from cprocessd-stream
  26. -> sends e-mails for configured importance levels
  27. cbotd:
  28. -> currently named 'alert'
  29. -> receives data from cprocessd-stream
  30. -> IRC bot
  31. -> posts alerts to specified IRC channels, depending on minimum severity level configured for that channel (ie. INFO for #cryto-network but ERR for #crytocc)
  32. csmsd:
  33. -> sends SMS for (critical) alerts
  34. -> receives data from cprocessd-stream
  35. -> Twilio? does a provider-neutral API exist? might need an extra abstraction...
  36. cwebd:
  37. -> offers web interface with streaming status data
  38. -> publicly accessible and password-protected
  39. -> streaming data from cprocessd-stream
  40. -> on-pageload state from cprocessd-query (including 'current downtimes')
  41. -> tornado+zmq ioloop, http://zeromq.github.io/pyzmq/eventloop.html
  42. -> web dashboard
  43. -> AngularJS
  44. -> fancy graphs (via AngularJS? idk if a directive exists for this)
  45. -> show downtimes as well as live per-machine stats
  46. -> also show overview of all machines in a grid, color-coded for average load of all resources
  47. -> historical up/down data
  48. -> sqlite storage? single concurrent write, so should work
  49. -> perhaps letting people sign up for e-mail alerts is an option? to-inbox will be tricky here