Show IOps per VM/client to find heavy users...

andrija.panic@xxxxxxxxx (Andrija Panic) · Mon, 11 Aug 2014 12:46:54 +0200

Hi Dan,

the script provided seems to not work on my ceph cluster :(
This is ceph version 0.80.3

I get empty results, on both debug level 10 and the maximum level of 20...

[root at cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
Writes per OSD:
Writes per pool:
Writes per PG:
Writes per RBD:
Writes per object:
Writes per length:
.
.
.

On 8 August 2014 16:01, Dan Van Der Ster <daniel.vanderster at cern.ch> wrote:

>  Hi,
>
>  On 08 Aug 2014, at 15:55, Andrija Panic <andrija.panic at gmail.com> wrote:
>
>  Hi Dan,
>
>  thank you very much for the script, will check it out...no thortling so
> far, but I guess it will have to be done...
>
>  This seems to read only gziped logs?
>
>
>  Well it?s pretty simple, and it zcat?s each input file. So yes, only gz
> files in the current script. But you can change that pretty trivially ;)
>
>  so since read only I guess it is safe to run it on proudction cluster
> now? ?
>
>
>  I personally don?t do anything new on a Friday just before leaving ;)
>
>  But its just grepping the log files, so start with one, then two, then...
>
>   The script will also check for mulitply OSDs as far as I can
> understadn, not just osd.0 given in script comment ?
>
>
>  Yup, what I do is gather all of the OSD logs for a single day in a
> single directory (in CephFS ;), then run that script on all of the OSDs. It
> takes awhile, but it will give you the overall daily totals for the whole
> cluster.
>
>  If you are only trying to find the top users, then it is sufficient to
> check a subset of OSDs, since by their nature the client IOs are spread
> across most/all OSDs.
>
>  Cheers, Dan
>
>  Thanks a lot.
> Andrija
>
>
>
>
> On 8 August 2014 15:44, Dan Van Der Ster <daniel.vanderster at cern.ch>
> wrote:
>
>> Hi,
>> Here?s what we do to identify our top RBD users.
>>
>>  First, enable log level 10 for the filestore so you can see all the IOs
>> coming from the VMs. Then use a script like this (used on a dumpling
>> cluster):
>>
>>
>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>
>>  to summarize the osd logs and identify the top clients.
>>
>>  Then its just a matter of scripting to figure out the ops/sec per
>> volume, but for us at least the main use-case has been to identify who is
>> responsible for a new peak in overall ops ? and daily-granular statistics
>> from the above script tends to suffice.
>>
>>  BTW, do you throttle your clients? We found that its absolutely
>> necessary, since without a throttle just a few active VMs can eat up the
>> entire iops capacity of the cluster.
>>
>>  Cheers, Dan
>>
>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>
>>
>>   On 08 Aug 2014, at 13:51, Andrija Panic <andrija.panic at gmail.com>
>> wrote:
>>
>>    Hi,
>>
>>  we just had some new clients, and have suffered very big degradation in
>> CEPH performance for some reasons (we are using CloudStack).
>>
>>  I'm wondering if there is way to monitor OP/s or similar usage by
>> client connected, so we can isolate the heavy client ?
>>
>>  Also, what is the general best practice to monitor these kind of
>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>
>>  Thanks,
>> --
>>
>> Andrija Pani?
>>
>>    _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
>  --
>
> Andrija Pani?
> --------------------------------------
>   http://admintweets.com
> --------------------------------------
>
>
>

-- 

Andrija Pani?
--------------------------------------
  http://admintweets.com
--------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140811/f2cf1304/attachment.htm>