Hi Jason, i uploaded a perf report to the issue (https://tracker.ceph.com/issues/23853) apt-get install linux-tools-4.13.0-39-generic linux-cloud-tools-4.13.0-39-generic linux-tools-generic linux-cloud-tools-generic perf record -F 99 -g rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test perf report -n --stdio > perf.data.txt I haven't used perf in the last years very much - let me know if you need additional traces Regards Marc Am 25.04.2018 um 17:34 schrieb Jason Dillaman: > Since I cannot reproduce your issue, can you generate a perf CPU flame > graph on this to figure out where the user time is being spent? > > On Wed, Apr 25, 2018 at 11:25 AM, Marc Schöchlin <ms@xxxxxxxxxx> wrote: >> Hello Jason, >> >> according to this, latency between client and osd should not be the problem: >> (the high amount of user time in the measure above, network >> communication should not be the problem) >> >> Finding the involved osd: >> >> # ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c >> rbd_directory >> osdmap e7570 pool 'RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c' >> (14) object 'rbd_directory' -> pg 14.30a98c1c (14.1c) -> up ([36,0,38], >> p36) acting ([36,0,38], p36) >> >> # ceph osd find osd.36 >> { >> "osd": 36, >> "ip": "10.23.27.149:6826/7195", >> "crush_location": { >> "host": "ceph-ssd-s39", >> "root": "default" >> } >> } >> >> ssh ceph-ssd-s39 >> >> # nuttcp -w1m ceph-mon-s43 >> 11186.3391 MB / 10.00 sec = 9381.8890 Mbps 12 %TX 32 %RX 0 retrans 0.15 >> msRTT >> >> # time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c >> --rbd_concurrent_management_ops=1 --id xen_test >> NAME SIZE >> PARENT >> FMT PROT LOCK >> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE >> 2 >> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE >> 2 yes >> ... >> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE >> 2 >> __srlock__ >> 0 >> 2 >> >> real 0m23.667s >> user 0m15.949s >> sys 0m1.276s >> >> # time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c >> --rbd_concurrent_management_ops=1 --id xen_test >> NAME SIZE >> PARENT >> FMT PROT LOCK >> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE >> 2 >> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE >> 2 yes >> ... >> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M >> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE >> 2 >> >> .... >> __srlock__ >> 0 >> 2 >> >> real 0m13.937s >> user 0m14.404s >> sys 0m1.089s >> >> Regards >> Marc >> >> >> Am 25.04.2018 um 16:38 schrieb Jason Dillaman: >>> I'd check your latency between your client and your cluster. On my >>> development machine w/ only a single OSD running and 200 clones, each >>> with 1 snapshot, "rbd -l" only takes a couple seconds for me: >>> >>> $ time rbd ls -l --rbd_concurrent_management_ops=1 | wc -l >>> 403 >>> >>> real 0m1.746s >>> user 0m1.136s >>> sys 0m0.169s >>> >>> Also, I have to ask, but how often are you expecting to scrape the >>> images from pool? The long directory list involves opening each image >>> in the pool (which involves numerous round-trips to the OSDs) plus >>> iterating through each snapshot (which also involves round-trips). >>> >>> On Wed, Apr 25, 2018 at 10:13 AM, Marc Schöchlin <ms@xxxxxxxxxx> wrote: >>>> Hello Piotr, >>>> >>>> i updated the issue. >>>> (https://tracker.ceph.com/issues/23853?next_issue_id=23852&prev_issue_id=23854) >>>> >>>> # time rbd ls -l --pool >>>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c >>>> --rbd_concurrent_management_ops=1 >>>> NAME SIZE PARENT >>>> >>>> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M >>>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE >>>> 2 >>>> __srlock__ >>>> 0 >>>> 2 >>>> .... >>>> real 0m18.562s >>>> user 0m12.513s >>>> sys 0m0.793s >>>> >>>> I also attached a json dump of my pool structure. >>>> >>>> Regards >>>> Marc >>>> >>>> Am 25.04.2018 um 14:46 schrieb Piotr Dałek: >>>>> On 18-04-25 02:29 PM, Marc Schöchlin wrote: >>>>>> Hello list, >>>>>> >>>>>> we are trying to integrate a storage repository in xenserver. >>>>>> (i also describe the problem as a issue in the ceph bugtracker: >>>>>> https://tracker.ceph.com/issues/23853) >>>>>> >>>>>> Summary: >>>>>> >>>>>> The slowness is a real pain for us, because this prevents the xen >>>>>> storage repository to work efficently. >>>>>> Gathering information for XEN Pools with hundreds of virtual machines >>>>>> (using "--format json") would be a real pain... >>>>>> The high user time consumption and the really huge amount of threads >>>>>> suggests that there is something really inefficient in the "rbd" >>>>>> utility. >>>>>> >>>>>> So what can i do to make "rbd ls -l" faster or to get comparable >>>>>> information regarding snapshot hierarchy information? >>>>> Can you run this command with extra argument >>>>> "--rbd_concurrent_management_ops=1" and share the timing of that? >>>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com