On Mon, Aug 3, 2015 at 5:10 PM, Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> wrote: > The problem with this kind of monitoring is that there are so many possible > metrics to watch and so many possible ways to watch them. For myself, I'm > working on implementing a couple of things: > - Watching error counters on servers > - Watching error counters on switches > - Watching performance I would also check: - link speed (on both servers and switches) - link usage (over 80% issue a warning) .a. -- antonio.messina@xxxxxx S3IT: Services and Support for Science IT http://www.s3it.uzh.ch/ University of Zurich Y12 F 84 Winterthurerstrasse 190 CH-8057 Zurich Switzerland _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com