Re: Ceph RBD object-map and discard in VM

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 14 Jul 2016 17:40:31 -0400

I would probably be able to resolve the issue fairly quickly if it
would be possible for you to provide a RBD replay trace from a slow
and fast mkfs.xfs test run and attach it to the tracker ticket I just
opened for this issue [1]. You can follow the instructions here [2]
but would only need to perform steps 1 and 2 (attaching to output from
step 2 to the ticket).

Thanks,

[1] http://tracker.ceph.com/issues/16689
[2] http://docs.ceph.com/docs/master/rbd/rbd-replay/

On Thu, Jul 14, 2016 at 2:55 PM, Vaibhav Bhembre
<vaibhav@xxxxxxxxxxxxxxxx> wrote:
> We have been observing this similar behavior. Usually it is the case where
> we create a new rbd image, expose it into the guest and perform any
> operation that issues discard to the device.
>
> A typical command that's first run on a given device is mkfs, usually with
> discard on.
>
> # time mkfs.xfs -s size=4096 -f /dev/sda
> meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600 blks
>          =                       sectsz=4096  attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=26214400, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=12800, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> real 9m10.882s
> user 0m0.000s
> sys 0m0.012s
>
> When we issue this same command with object-map feature disabled on the
> image it completes much faster.
>
> # time mkfs.xfs -s size=4096 -f /dev/sda
> meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600 blks
>          =                       sectsz=4096  attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=26214400, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=12800, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> real 0m1.780s
> user 0m0.000s
> sys 0m0.012s
>
> Also from what I am seeing the slowness seems to be proportional to the size
> of the image rather than the amount of data written into it. Issuing mkfs
> without discard doesn't reproduce this issue. The above values were for 100G
> rbd image. The 250G takes slightly more than twice the time taken for 100G
> one.
>
> # time mkfs.xfs -s size=4096 -f /dev/sda
> meta-data=/dev/sda               isize=256    agcount=4, agsize=16384000
> blks
>          =                       sectsz=4096  attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=65536000, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=32000, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> real 22m58.076s
> user 0m0.000s
> sys 0m0.024s
>
> Let me know if you need any more information regarding this. We would like
> to enable object-map (and fast-diff) on our images once this gets resolved.
>
>
> On Wed, Jun 22, 2016 at 5:39 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>>
>> I'm not sure why I never received the original list email, so I
>> apologize for the delay. Is /dev/sda1, from your example, fresh with
>> no data to actually discard or does it actually have lots of data to
>> discard?
>>
>> Thanks,
>>
>> On Wed, Jun 22, 2016 at 1:56 PM, Brian Andrus <bandrus@xxxxxxxxxx> wrote:
>> > I've created a downstream bug for this same issue.
>> >
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1349116
>> >
>> > On Wed, Jun 15, 2016 at 6:23 AM, <list@xxxxxxxxxxxxxxx> wrote:
>> >>
>> >> Hello guys,
>> >>
>> >> We are currently testing Ceph Jewel with object-map feature enabled:
>> >>
>> >> rbd image 'disk-22920':
>> >>         size 102400 MB in 25600 objects
>> >>         order 22 (4096 kB objects)
>> >>         block_name_prefix: rbd_data.7cfa2238e1f29
>> >>         format: 2
>> >>         features: layering, exclusive-lock, object-map, fast-diff,
>> >> deep-flatten
>> >>         flags:
>> >>
>> >> We use this RBD as disk for a kvm virtual machine with virtio-scsi and
>> >> discard=unmap. We noticed the following paremeters in /sys/block:
>> >>
>> >> # cat /sys/block/sda/queue/discard_*
>> >> 4096
>> >> 1073741824
>> >> 0 <- discard_zeroes_data
>> >>
>> >> While trying to do a mkfs.ext4 on the disk in VM we noticed a low
>> >> performance with using discard.
>> >>
>> >> mkfs.ext4 -E nodiscard /dev/sda1 - tooks 5 seconds to complete
>> >> mkfs.ext4 -E discard /dev/sda1 - tooks around 3 monutes
>> >>
>> >> When disabling the object-map the mkfs with discard tooks just 5
>> >> seconds.
>> >>
>> >> Do you have any idea what might cause this issue?
>> >>
>> >> Kernel: 4.2.0-35-generic #40~14.04.1-Ubuntu
>> >> Ceph: 10.2.0
>> >> Libvirt: 1.3.1
>> >> QEMU: 2.5.0
>> >>
>> >> Thanks!
>> >>
>> >> Best regards,
>> >> Jonas
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> >
>> > --
>> > Brian Andrus
>> > Red Hat, Inc.
>> > Storage Consultant, Global Storage Practice
>> > Mobile +1 (530) 903-8487
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com