Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root at cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster <daniel.vanderster at cern.ch> wrote: > Hi, > > On 08 Aug 2014, at 15:55, Andrija Panic <andrija.panic at gmail.com> wrote: > > Hi Dan, > > thank you very much for the script, will check it out...no thortling so > far, but I guess it will have to be done... > > This seems to read only gziped logs? > > > Well it?s pretty simple, and it zcat?s each input file. So yes, only gz > files in the current script. But you can change that pretty trivially ;) > > so since read only I guess it is safe to run it on proudction cluster > now? ? > > > I personally don?t do anything new on a Friday just before leaving ;) > > But its just grepping the log files, so start with one, then two, then... > > The script will also check for mulitply OSDs as far as I can > understadn, not just osd.0 given in script comment ? > > > Yup, what I do is gather all of the OSD logs for a single day in a > single directory (in CephFS ;), then run that script on all of the OSDs. It > takes awhile, but it will give you the overall daily totals for the whole > cluster. > > If you are only trying to find the top users, then it is sufficient to > check a subset of OSDs, since by their nature the client IOs are spread > across most/all OSDs. > > Cheers, Dan > > Thanks a lot. > Andrija > > > > > On 8 August 2014 15:44, Dan Van Der Ster <daniel.vanderster at cern.ch> > wrote: > >> Hi, >> Here?s what we do to identify our top RBD users. >> >> First, enable log level 10 for the filestore so you can see all the IOs >> coming from the VMs. Then use a script like this (used on a dumpling >> cluster): >> >> >> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl >> >> to summarize the osd logs and identify the top clients. >> >> Then its just a matter of scripting to figure out the ops/sec per >> volume, but for us at least the main use-case has been to identify who is >> responsible for a new peak in overall ops ? and daily-granular statistics >> from the above script tends to suffice. >> >> BTW, do you throttle your clients? We found that its absolutely >> necessary, since without a throttle just a few active VMs can eat up the >> entire iops capacity of the cluster. >> >> Cheers, Dan >> >> -- Dan van der Ster || Data & Storage Services || CERN IT Department -- >> >> >> On 08 Aug 2014, at 13:51, Andrija Panic <andrija.panic at gmail.com> >> wrote: >> >> Hi, >> >> we just had some new clients, and have suffered very big degradation in >> CEPH performance for some reasons (we are using CloudStack). >> >> I'm wondering if there is way to monitor OP/s or similar usage by >> client connected, so we can isolate the heavy client ? >> >> Also, what is the general best practice to monitor these kind of >> changes in CEPH ? I'm talking about R/W or OP/s change or similar... >> >> Thanks, >> -- >> >> Andrija Pani? >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > > > -- > > Andrija Pani? > -------------------------------------- > http://admintweets.com > -------------------------------------- > > > -- Andrija Pani? -------------------------------------- http://admintweets.com -------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140811/f2cf1304/attachment.htm>