Re: Ceph RBD object-map and discard in VM

Jason Dillaman <jdillama@xxxxxxxxxx> · Fri, 15 Jul 2016 14:09:27 -0400

There appears to be a hole in the documentation.  You know have to set
a configuration option to enable tracing:

rbd_tracing = true

This will causes librbd.so to dynamically load the tracing module
librbd_tp.so (which has linkage to LTTng-UST).

On Fri, Jul 15, 2016 at 1:47 PM, Vaibhav Bhembre
<vaibhav@xxxxxxxxxxxxxxxx> wrote:
> I followed the steps mentioned in [1] but somehow I am unable to see any
> traces to continue with its step 2. There are no errors seen when performing
> operations mentioned in step 1. In my setup I am running lttng commands on
> the HV where my VM has the RBD device attached.
>
> My lttng version is as follows:
>
> $ lttng --version
> lttng (LTTng Trace Control) 2.4.0 - Époque Opaque
> r$ lttng-sessiond --version
> 2.4.0
>
> My uname -r looks like follows:
> Linux infra1node71 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> The kern.log is clear of any apparmor denials as well.
>
> Would I need to have my librbd linked with lttng-ust by any chance? I don't
> see it linked as seen below:
>
> $ ldd /usr/lib/librbd.so.1.0.0 | grep lttng
> $
>
> Any idea what I might be missing here to get lttng running successfully?
>
> [1] http://docs.ceph.com/docs/master/rbd/rbd-replay/
>
>
> On 07/14, Jason Dillaman wrote:
>>
>> I would probably be able to resolve the issue fairly quickly if it
>> would be possible for you to provide a RBD replay trace from a slow
>> and fast mkfs.xfs test run and attach it to the tracker ticket I just
>> opened for this issue [1]. You can follow the instructions here [2]
>> but would only need to perform steps 1 and 2 (attaching to output from
>> step 2 to the ticket).
>>
>> Thanks,
>>
>> [1] http://tracker.ceph.com/issues/16689
>> [2] http://docs.ceph.com/docs/master/rbd/rbd-replay/
>>
>> On Thu, Jul 14, 2016 at 2:55 PM, Vaibhav Bhembre
>> <vaibhav@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> We have been observing this similar behavior. Usually it is the case
>>> where
>>> we create a new rbd image, expose it into the guest and perform any
>>> operation that issues discard to the device.
>>>
>>> A typical command that's first run on a given device is mkfs, usually
>>> with
>>> discard on.
>>>
>>> # time mkfs.xfs -s size=4096 -f /dev/sda
>>> meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600
>>> blks
>>>          =                       sectsz=4096  attr=2, projid32bit=0
>>> data     =                       bsize=4096   blocks=26214400, imaxpct=25
>>>          =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096   ascii-ci=0
>>> log      =internal log           bsize=4096   blocks=12800, version=2
>>>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>
>>> real 9m10.882s
>>> user 0m0.000s
>>> sys 0m0.012s
>>>
>>> When we issue this same command with object-map feature disabled on the
>>> image it completes much faster.
>>>
>>> # time mkfs.xfs -s size=4096 -f /dev/sda
>>> meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600
>>> blks
>>>          =                       sectsz=4096  attr=2, projid32bit=0
>>> data     =                       bsize=4096   blocks=26214400, imaxpct=25
>>>          =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096   ascii-ci=0
>>> log      =internal log           bsize=4096   blocks=12800, version=2
>>>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>
>>> real 0m1.780s
>>> user 0m0.000s
>>> sys 0m0.012s
>>>
>>> Also from what I am seeing the slowness seems to be proportional to the
>>> size
>>> of the image rather than the amount of data written into it. Issuing mkfs
>>> without discard doesn't reproduce this issue. The above values were for
>>> 100G
>>> rbd image. The 250G takes slightly more than twice the time taken for
>>> 100G
>>> one.
>>>
>>> # time mkfs.xfs -s size=4096 -f /dev/sda
>>> meta-data=/dev/sda               isize=256    agcount=4, agsize=16384000
>>> blks
>>>          =                       sectsz=4096  attr=2, projid32bit=0
>>> data     =                       bsize=4096   blocks=65536000, imaxpct=25
>>>          =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096   ascii-ci=0
>>> log      =internal log           bsize=4096   blocks=32000, version=2
>>>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>
>>> real 22m58.076s
>>> user 0m0.000s
>>> sys 0m0.024s
>>>
>>> Let me know if you need any more information regarding this. We would
>>> like
>>> to enable object-map (and fast-diff) on our images once this gets
>>> resolved.
>>>
>>>
>>> On Wed, Jun 22, 2016 at 5:39 PM, Jason Dillaman <jdillama@xxxxxxxxxx>
>>> wrote:
>>>>
>>>>
>>>> I'm not sure why I never received the original list email, so I
>>>> apologize for the delay. Is /dev/sda1, from your example, fresh with
>>>> no data to actually discard or does it actually have lots of data to
>>>> discard?
>>>>
>>>> Thanks,
>>>>
>>>> On Wed, Jun 22, 2016 at 1:56 PM, Brian Andrus <bandrus@xxxxxxxxxx>
>>>> wrote:
>>>> > I've created a downstream bug for this same issue.
>>>> >
>>>> > https://bugzilla.redhat.com/show_bug.cgi?id=1349116
>>>> >
>>>> > On Wed, Jun 15, 2016 at 6:23 AM, <list@xxxxxxxxxxxxxxx> wrote:
>>>> >>
>>>> >> Hello guys,
>>>> >>
>>>> >> We are currently testing Ceph Jewel with object-map feature enabled:
>>>> >>
>>>> >> rbd image 'disk-22920':
>>>> >>         size 102400 MB in 25600 objects
>>>> >>         order 22 (4096 kB objects)
>>>> >>         block_name_prefix: rbd_data.7cfa2238e1f29
>>>> >>         format: 2
>>>> >>         features: layering, exclusive-lock, object-map, fast-diff,
>>>> >> deep-flatten
>>>> >>         flags:
>>>> >>
>>>> >> We use this RBD as disk for a kvm virtual machine with virtio-scsi
>>>> >> and
>>>> >> discard=unmap. We noticed the following paremeters in /sys/block:
>>>> >>
>>>> >> # cat /sys/block/sda/queue/discard_*
>>>> >> 4096
>>>> >> 1073741824
>>>> >> 0 <- discard_zeroes_data
>>>> >>
>>>> >> While trying to do a mkfs.ext4 on the disk in VM we noticed a low
>>>> >> performance with using discard.
>>>> >>
>>>> >> mkfs.ext4 -E nodiscard /dev/sda1 - tooks 5 seconds to complete
>>>> >> mkfs.ext4 -E discard /dev/sda1 - tooks around 3 monutes
>>>> >>
>>>> >> When disabling the object-map the mkfs with discard tooks just 5
>>>> >> seconds.
>>>> >>
>>>> >> Do you have any idea what might cause this issue?
>>>> >>
>>>> >> Kernel: 4.2.0-35-generic #40~14.04.1-Ubuntu
>>>> >> Ceph: 10.2.0
>>>> >> Libvirt: 1.3.1
>>>> >> QEMU: 2.5.0
>>>> >>
>>>> >> Thanks!
>>>> >>
>>>> >> Best regards,
>>>> >> Jonas
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users@xxxxxxxxxxxxxx
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Brian Andrus
>>>> > Red Hat, Inc.
>>>> > Storage Consultant, Global Storage Practice
>>>> > Mobile +1 (530) 903-8487
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>>
>>
>> --
>> Jason
>
>
> --
> Vaibhav Bhembre

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com