Re: Check networking first?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The problem with this kind of monitoring is that there are so many possible metrics to watch and so many possible ways to watch them. For myself, I'm working on implementing a couple of things:
- Watching error counters on servers
- Watching error counters on switches
- Watching performance

My plan for this is to feed these metrics into graphite and then use Skyline to do anomaly detection on them. The error counts are simple collectors from every machine, a very light test. The performance is a bit trickier. My intent is to do an iperf test between two semi-randomly selected nodes in the cluster every 30 minutes. After each node is tested successfully it will be removed from the pool of potential nodes to test. Once the pool is depleted, it gets reset. If that proves to be too intense, I'll do something lighter.

QH

On Mon, Aug 3, 2015 at 6:21 AM, John Spray <jspray@xxxxxxxxxx> wrote:
On Mon, Aug 3, 2015 at 12:30 PM, Stijn De Weirdt
<stijn.deweirdt@xxxxxxxx> wrote:
>> Like a lot of system monitoring stuff, this is the kind of thing that
>> in an ideal world we wouldn't have to worry about, but the experience
>> in practice is that people deploy big distributed storage systems
>> without having really good monitoring in place.  We (people providing
>
> not to become completely off-topic but do you have any suggestions for such
> "really good monitoring" that could help monitor the many-to-many
> communication pattern that is typical for ceph cluster? especially the
> performance part, not only the funxtional part.

I guess I'm kind of just assuming that ops people have tools for this
stuff -- I don't run any large systems myself, so can't recommend
anything.

John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux