Re: Integrating XEN Server : Long query time for "rbd ls -l" queries

Marc Schöchlin <ms@xxxxxxxxxx> · Thu, 26 Apr 2018 12:45:45 +0200

Hi Jason,

i uploaded a perf report to the issue
(https://tracker.ceph.com/issues/23853)

apt-get install linux-tools-4.13.0-39-generic
linux-cloud-tools-4.13.0-39-generic linux-tools-generic
linux-cloud-tools-generic
perf record -F 99 -g rbd ls -l -p
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test
perf report -n --stdio > perf.data.txt

I haven't used perf in the last years very much - let me know if you
need additional traces

Regards
Marc

Am 25.04.2018 um 17:34 schrieb Jason Dillaman:
> Since I cannot reproduce your issue, can you generate a perf CPU flame
> graph on this to figure out where the user time is being spent?
>
> On Wed, Apr 25, 2018 at 11:25 AM, Marc Schöchlin <ms@xxxxxxxxxx> wrote:
>> Hello Jason,
>>
>> according to this, latency between client and osd should not be the problem:
>> (the high amount of user time in the measure above, network
>> communication should not be the problem)
>>
>> Finding the involved osd:
>>
>> # ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
>> rbd_directory
>> osdmap e7570 pool 'RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c'
>> (14) object 'rbd_directory' -> pg 14.30a98c1c (14.1c) -> up ([36,0,38],
>> p36) acting ([36,0,38], p36)
>>
>> # ceph osd find osd.36
>> {
>>     "osd": 36,
>>     "ip": "10.23.27.149:6826/7195",
>>     "crush_location": {
>>         "host": "ceph-ssd-s39",
>>         "root": "default"
>>     }
>> }
>>
>> ssh ceph-ssd-s39
>>
>> # nuttcp -w1m ceph-mon-s43
>> 11186.3391 MB /  10.00 sec = 9381.8890 Mbps 12 %TX 32 %RX 0 retrans 0.15
>> msRTT
>>
>> # time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
>> --rbd_concurrent_management_ops=1 --id xen_test
>> NAME                                            SIZE
>> PARENT
>> FMT PROT LOCK
>> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81      20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
>> 2
>> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
>> 2 yes
>> ...
>> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
>> 2
>> __srlock__
>> 0
>> 2
>>
>> real    0m23.667s
>> user    0m15.949s
>> sys    0m1.276s
>>
>> # time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
>> --rbd_concurrent_management_ops=1 --id xen_test
>> NAME                                            SIZE
>> PARENT
>> FMT PROT LOCK
>> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81      20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
>> 2
>> RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
>> 2 yes
>> ...
>> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
>> 2
>>
>> ....
>> __srlock__
>> 0
>> 2
>>
>> real    0m13.937s
>> user    0m14.404s
>> sys    0m1.089s
>>
>> Regards
>> Marc
>>
>>
>> Am 25.04.2018 um 16:38 schrieb Jason Dillaman:
>>> I'd check your latency between your client and your cluster. On my
>>> development machine w/ only a single OSD running and 200 clones, each
>>> with 1 snapshot, "rbd -l" only takes a couple seconds for me:
>>>
>>> $ time rbd ls -l --rbd_concurrent_management_ops=1 | wc -l
>>> 403
>>>
>>> real 0m1.746s
>>> user 0m1.136s
>>> sys 0m0.169s
>>>
>>> Also, I have to ask, but how often are you expecting to scrape the
>>> images from pool? The long directory list involves opening each image
>>> in the pool (which involves numerous round-trips to the OSDs) plus
>>> iterating through each snapshot (which also involves round-trips).
>>>
>>> On Wed, Apr 25, 2018 at 10:13 AM, Marc Schöchlin <ms@xxxxxxxxxx> wrote:
>>>> Hello Piotr,
>>>>
>>>> i updated the issue.
>>>> (https://tracker.ceph.com/issues/23853?next_issue_id=23852&prev_issue_id=23854)
>>>>
>>>> # time rbd ls -l --pool
>>>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
>>>> --rbd_concurrent_management_ops=1
>>>> NAME                                            SIZE PARENT
>>>>
>>>> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M
>>>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
>>>> 2
>>>> __srlock__
>>>> 0
>>>> 2
>>>> ....
>>>> real    0m18.562s
>>>> user    0m12.513s
>>>> sys    0m0.793s
>>>>
>>>> I also attached a json dump of my pool structure.
>>>>
>>>> Regards
>>>> Marc
>>>>
>>>> Am 25.04.2018 um 14:46 schrieb Piotr Dałek:
>>>>> On 18-04-25 02:29 PM, Marc Schöchlin wrote:
>>>>>> Hello list,
>>>>>>
>>>>>> we are trying to integrate a storage repository in xenserver.
>>>>>> (i also describe the problem as a issue in the ceph bugtracker:
>>>>>> https://tracker.ceph.com/issues/23853)
>>>>>>
>>>>>> Summary:
>>>>>>
>>>>>> The slowness is a real pain for us, because this prevents the xen
>>>>>> storage repository to work efficently.
>>>>>> Gathering information for XEN Pools with hundreds of virtual machines
>>>>>> (using "--format json") would be a real pain...
>>>>>> The high user time consumption and the really huge amount of threads
>>>>>> suggests that there is something really inefficient in the "rbd"
>>>>>> utility.
>>>>>>
>>>>>> So what can i do to make "rbd ls -l" faster or to get comparable
>>>>>> information regarding snapshot hierarchy information?
>>>>> Can you run this command with extra argument
>>>>> "--rbd_concurrent_management_ops=1" and share the timing of that?
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com