Show IOps per VM/client to find heavy users...

daniel.vanderster@xxxxxxx (Dan Van Der Ster) · Fri, 8 Aug 2014 14:01:35 +0000

Hi,

On 08 Aug 2014, at 15:55, Andrija Panic <andrija.panic at gmail.com<mailto:andrija.panic at gmail.com>> wrote:

Hi Dan,

thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done...

This seems to read only gziped logs?

Well it?s pretty simple, and it zcat?s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;)

so since read only I guess it is safe to run it on proudction cluster now? ?

I personally don?t do anything new on a Friday just before leaving ;)

But its just grepping the log files, so start with one, then two, then...

The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ?

Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster.

If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs.

Cheers, Dan

Thanks a lot.
Andrija

On 8 August 2014 15:44, Dan Van Der Ster <daniel.vanderster at cern.ch<mailto:daniel.vanderster at cern.ch>> wrote:
Hi,
Here?s what we do to identify our top RBD users.

First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster):

  https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

to summarize the osd logs and identify the top clients.

Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops ? and daily-granular statistics from the above script tends to suffice.

BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster.

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --

On 08 Aug 2014, at 13:51, Andrija Panic <andrija.panic at gmail.com<mailto:andrija.panic at gmail.com>> wrote:

Hi,

we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Pani?

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Andrija Pani?
--------------------------------------
  http://admintweets.com<http://admintweets.com/>
--------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140808/e5abe7a7/attachment.htm>