Re: Ceph RBD object-map and discard in VM

Vaibhav Bhembre <vaibhav@xxxxxxxxxxxxxxxx> · Fri, 15 Jul 2016 13:47:11 -0400

I followed the steps mentioned in [1] but somehow I am unable to see any 
traces to continue with its step 2. There are no errors seen when 
performing operations mentioned in step 1. In my setup I am running 
lttng commands on the HV where my VM has the RBD device attached.

My lttng version is as follows:

$ lttng --version
lttng (LTTng Trace Control) 2.4.0 - Époque Opaque
r$ lttng-sessiond --version
2.4.0

My uname -r looks like follows:
Linux infra1node71 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The kern.log is clear of any apparmor denials as well.

Would I need to have my librbd linked with lttng-ust by any chance? I 
don't see it linked as seen below:

$ ldd /usr/lib/librbd.so.1.0.0 | grep lttng
$

Any idea what I might be missing here to get lttng running successfully?

[1] http://docs.ceph.com/docs/master/rbd/rbd-replay/

On 07/14, Jason Dillaman wrote:
I would probably be able to resolve the issue fairly quickly if it
would be possible for you to provide a RBD replay trace from a slow
and fast mkfs.xfs test run and attach it to the tracker ticket I just
opened for this issue [1]. You can follow the instructions here [2]
but would only need to perform steps 1 and 2 (attaching to output from
step 2 to the ticket).

Thanks,

[1] http://tracker.ceph.com/issues/16689
[2] http://docs.ceph.com/docs/master/rbd/rbd-replay/

On Thu, Jul 14, 2016 at 2:55 PM, Vaibhav Bhembre
<vaibhav@xxxxxxxxxxxxxxxx> wrote:
We have been observing this similar behavior. Usually it is the case where
we create a new rbd image, expose it into the guest and perform any
operation that issues discard to the device.

A typical command that's first run on a given device is mkfs, usually with
discard on.

# time mkfs.xfs -s size=4096 -f /dev/sda
meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real 9m10.882s
user 0m0.000s
sys 0m0.012s

When we issue this same command with object-map feature disabled on the
image it completes much faster.

# time mkfs.xfs -s size=4096 -f /dev/sda
meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real 0m1.780s
user 0m0.000s
sys 0m0.012s

Also from what I am seeing the slowness seems to be proportional to the size
of the image rather than the amount of data written into it. Issuing mkfs
without discard doesn't reproduce this issue. The above values were for 100G
rbd image. The 250G takes slightly more than twice the time taken for 100G
one.

# time mkfs.xfs -s size=4096 -f /dev/sda
meta-data=/dev/sda               isize=256    agcount=4, agsize=16384000
blks
         =                       sectsz=4096  attr=2, projid32bit=0
data     =                       bsize=4096   blocks=65536000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32000, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real 22m58.076s
user 0m0.000s
sys 0m0.024s

Let me know if you need any more information regarding this. We would like
to enable object-map (and fast-diff) on our images once this gets resolved.

On Wed, Jun 22, 2016 at 5:39 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:

I'm not sure why I never received the original list email, so I
apologize for the delay. Is /dev/sda1, from your example, fresh with
no data to actually discard or does it actually have lots of data to
discard?

Thanks,

On Wed, Jun 22, 2016 at 1:56 PM, Brian Andrus <bandrus@xxxxxxxxxx> wrote:
> I've created a downstream bug for this same issue.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1349116
>
> On Wed, Jun 15, 2016 at 6:23 AM, <list@xxxxxxxxxxxxxxx> wrote:
>>
>> Hello guys,
>>
>> We are currently testing Ceph Jewel with object-map feature enabled:
>>
>> rbd image 'disk-22920':
>>         size 102400 MB in 25600 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.7cfa2238e1f29
>>         format: 2
>>         features: layering, exclusive-lock, object-map, fast-diff,
>> deep-flatten
>>         flags:
>>
>> We use this RBD as disk for a kvm virtual machine with virtio-scsi and
>> discard=unmap. We noticed the following paremeters in /sys/block:
>>
>> # cat /sys/block/sda/queue/discard_*
>> 4096
>> 1073741824
>> 0 <- discard_zeroes_data
>>
>> While trying to do a mkfs.ext4 on the disk in VM we noticed a low
>> performance with using discard.
>>
>> mkfs.ext4 -E nodiscard /dev/sda1 - tooks 5 seconds to complete
>> mkfs.ext4 -E discard /dev/sda1 - tooks around 3 monutes
>>
>> When disabling the object-map the mkfs with discard tooks just 5
>> seconds.
>>
>> Do you have any idea what might cause this issue?
>>
>> Kernel: 4.2.0-35-generic #40~14.04.1-Ubuntu
>> Ceph: 10.2.0
>> Libvirt: 1.3.1
>> QEMU: 2.5.0
>>
>> Thanks!
>>
>> Best regards,
>> Jonas
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Brian Andrus
> Red Hat, Inc.
> Storage Consultant, Global Storage Practice
> Mobile +1 (530) 903-8487
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Jason

--
Vaibhav Bhembre
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com