Show IOps per VM/client to find heavy users...

andrija.panic@xxxxxxxxx (Andrija Panic) · Mon, 11 Aug 2014 14:43:39 +0200

That's better :D

Thanks a lot, now I will be able to troubleshoot my problem :)

Thanks Dan,
Andrija

On 11 August 2014 13:21, Dan Van Der Ster <daniel.vanderster at cern.ch> wrote:

>  Hi,
> I changed the script to be a bit more flexible with the osd path. Give
> this a try again:
> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
> Cheers, Dan
>
> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>
>
>  On 11 Aug 2014, at 12:48, Andrija Panic <andrija.panic at gmail.com> wrote:
>
>  I appologize, clicked the Send button to fast...
>
>  Anyway, I can see there are lines in log file:
> 2014-08-11 12:43:25.477693 7f022d257700 10
> filestore(/var/lib/ceph/osd/ceph-0) write
> 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.0000000000001bd1/head//3
> 3641344~4608 = 4608
>  Not sure if I can do anything to fix this... ?
>
>  Thanks,
> Andrija
>
>
>
> On 11 August 2014 12:46, Andrija Panic <andrija.panic at gmail.com> wrote:
>
>> Hi Dan,
>>
>>  the script provided seems to not work on my ceph cluster :(
>> This is ceph version 0.80.3
>>
>>  I get empty results, on both debug level 10 and the maximum level of
>> 20...
>>
>>  [root at cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
>> Writes per OSD:
>> Writes per pool:
>>  Writes per PG:
>>  Writes per RBD:
>>  Writes per object:
>>  Writes per length:
>>  .
>>  .
>> .
>>
>>
>>
>>
>> On 8 August 2014 16:01, Dan Van Der Ster <daniel.vanderster at cern.ch>
>> wrote:
>>
>>> Hi,
>>>
>>>  On 08 Aug 2014, at 15:55, Andrija Panic <andrija.panic at gmail.com>
>>> wrote:
>>>
>>>  Hi Dan,
>>>
>>>  thank you very much for the script, will check it out...no thortling
>>> so far, but I guess it will have to be done...
>>>
>>>  This seems to read only gziped logs?
>>>
>>>
>>>  Well it?s pretty simple, and it zcat?s each input file. So yes, only
>>> gz files in the current script. But you can change that pretty trivially ;)
>>>
>>>  so since read only I guess it is safe to run it on proudction cluster
>>> now? ?
>>>
>>>
>>>  I personally don?t do anything new on a Friday just before leaving ;)
>>>
>>>  But its just grepping the log files, so start with one, then two,
>>> then...
>>>
>>>   The script will also check for mulitply OSDs as far as I can
>>> understadn, not just osd.0 given in script comment ?
>>>
>>>
>>>  Yup, what I do is gather all of the OSD logs for a single day in a
>>> single directory (in CephFS ;), then run that script on all of the OSDs. It
>>> takes awhile, but it will give you the overall daily totals for the whole
>>> cluster.
>>>
>>>  If you are only trying to find the top users, then it is sufficient to
>>> check a subset of OSDs, since by their nature the client IOs are spread
>>> across most/all OSDs.
>>>
>>>  Cheers, Dan
>>>
>>>  Thanks a lot.
>>> Andrija
>>>
>>>
>>>
>>>
>>> On 8 August 2014 15:44, Dan Van Der Ster <daniel.vanderster at cern.ch>
>>> wrote:
>>>
>>>> Hi,
>>>> Here?s what we do to identify our top RBD users.
>>>>
>>>>  First, enable log level 10 for the filestore so you can see all the
>>>> IOs coming from the VMs. Then use a script like this (used on a dumpling
>>>> cluster):
>>>>
>>>>
>>>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>>>
>>>>  to summarize the osd logs and identify the top clients.
>>>>
>>>>  Then its just a matter of scripting to figure out the ops/sec per
>>>> volume, but for us at least the main use-case has been to identify who is
>>>> responsible for a new peak in overall ops ? and daily-granular statistics
>>>> from the above script tends to suffice.
>>>>
>>>>  BTW, do you throttle your clients? We found that its absolutely
>>>> necessary, since without a throttle just a few active VMs can eat up the
>>>> entire iops capacity of the cluster.
>>>>
>>>>  Cheers, Dan
>>>>
>>>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>>>
>>>>
>>>>   On 08 Aug 2014, at 13:51, Andrija Panic <andrija.panic at gmail.com>
>>>> wrote:
>>>>
>>>>    Hi,
>>>>
>>>>  we just had some new clients, and have suffered very big degradation
>>>> in CEPH performance for some reasons (we are using CloudStack).
>>>>
>>>>  I'm wondering if there is way to monitor OP/s or similar usage by
>>>> client connected, so we can isolate the heavy client ?
>>>>
>>>>  Also, what is the general best practice to monitor these kind of
>>>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>>>
>>>>  Thanks,
>>>> --
>>>>
>>>> Andrija Pani?
>>>>
>>>>    _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>>
>>> Andrija Pani?
>>> --------------------------------------
>>>   http://admintweets.com
>>> --------------------------------------
>>>
>>>
>>>
>>
>>
>>  --
>>
>> Andrija Pani?
>> --------------------------------------
>>   http://admintweets.com
>> --------------------------------------
>>
>
>
>
>  --
>
> Andrija Pani?
> --------------------------------------
>   http://admintweets.com
> --------------------------------------
>
>
>

-- 

Andrija Pani?
--------------------------------------
  http://admintweets.com
--------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140811/2a048cad/attachment.htm>