Re: Ceph RBD object-map and discard in VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any chance you can zip up the raw LTTng-UST files and attach them to
the ticket? It appears that the rbd-replay-prep tool doesn't record
translate discard events.

The change sounds good to me -- but it would also need to be made in
librados and ceph-osd since I'm sure they would have the same issue.

On Sat, Jul 16, 2016 at 8:48 PM, Vaibhav Bhembre
<vaibhav@xxxxxxxxxxxxxxxx> wrote:
> I was finally able to complete the trace. So along with enabling
> *rbd_tracing = true* like you adviced I had to symlink *librbd_tp.so* to
> point to *librbd_tp.so.1*. Since the SONAME of the library includes the
> version number I think we might need to update it in the place it is
> referenced from librbd.
>
> https://github.com/ceph/ceph/blob/master/src/librbd/librbd.cc#L58
>
> I have uploaded the traces onto the tracker. Please let me know if there
> is anything more I can provide.
>
> Meanwhile, I can also push a fix for the issue with empty traces on
> Ubuntu/Debian if you think that change should be fine.
>
> Thanks!
>
> On 07/15, Vaibhav Bhembre wrote:
>> I enabled rbd_tracing on HV and restarted the guest as to pick the new
>> configuration up. The change in value of *rbd_tracing* was confirmed from
>> the admin socket. I am still unable to see any trace.
>>
>> lsof -p <vm-process-id> does not show *librbd_tp.so* loaded despite multiple
>> restarts.  Only *librbd.so* seems to be loaded.
>>
>> No oddities in kern.log are observed.
>>
>> Let me know if I can provide any other information. Thanks!
>>
>> On 07/15, Jason Dillaman wrote:
>> >There appears to be a hole in the documentation.  You know have to set
>> >a configuration option to enable tracing:
>> >
>> >rbd_tracing = true
>> >
>> >This will causes librbd.so to dynamically load the tracing module
>> >librbd_tp.so (which has linkage to LTTng-UST).
>> >
>> >On Fri, Jul 15, 2016 at 1:47 PM, Vaibhav Bhembre
>> ><vaibhav@xxxxxxxxxxxxxxxx> wrote:
>> >>I followed the steps mentioned in [1] but somehow I am unable to see any
>> >>traces to continue with its step 2. There are no errors seen when performing
>> >>operations mentioned in step 1. In my setup I am running lttng commands on
>> >>the HV where my VM has the RBD device attached.
>> >>
>> >>My lttng version is as follows:
>> >>
>> >>$ lttng --version
>> >>lttng (LTTng Trace Control) 2.4.0 - Époque Opaque
>> >>r$ lttng-sessiond --version
>> >>2.4.0
>> >>
>> >>My uname -r looks like follows:
>> >>Linux infra1node71 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC
>> >>2015 x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >>The kern.log is clear of any apparmor denials as well.
>> >>
>> >>Would I need to have my librbd linked with lttng-ust by any chance? I don't
>> >>see it linked as seen below:
>> >>
>> >>$ ldd /usr/lib/librbd.so.1.0.0 | grep lttng
>> >>$
>> >>
>> >>Any idea what I might be missing here to get lttng running successfully?
>> >>
>> >>[1] http://docs.ceph.com/docs/master/rbd/rbd-replay/
>> >>
>> >>
>> >>On 07/14, Jason Dillaman wrote:
>> >>>
>> >>>I would probably be able to resolve the issue fairly quickly if it
>> >>>would be possible for you to provide a RBD replay trace from a slow
>> >>>and fast mkfs.xfs test run and attach it to the tracker ticket I just
>> >>>opened for this issue [1]. You can follow the instructions here [2]
>> >>>but would only need to perform steps 1 and 2 (attaching to output from
>> >>>step 2 to the ticket).
>> >>>
>> >>>Thanks,
>> >>>
>> >>>[1] http://tracker.ceph.com/issues/16689
>> >>>[2] http://docs.ceph.com/docs/master/rbd/rbd-replay/
>> >>>
>> >>>On Thu, Jul 14, 2016 at 2:55 PM, Vaibhav Bhembre
>> >>><vaibhav@xxxxxxxxxxxxxxxx> wrote:
>> >>>>
>> >>>>We have been observing this similar behavior. Usually it is the case
>> >>>>where
>> >>>>we create a new rbd image, expose it into the guest and perform any
>> >>>>operation that issues discard to the device.
>> >>>>
>> >>>>A typical command that's first run on a given device is mkfs, usually
>> >>>>with
>> >>>>discard on.
>> >>>>
>> >>>># time mkfs.xfs -s size=4096 -f /dev/sda
>> >>>>meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600
>> >>>>blks
>> >>>>         =                       sectsz=4096  attr=2, projid32bit=0
>> >>>>data     =                       bsize=4096   blocks=26214400, imaxpct=25
>> >>>>         =                       sunit=0      swidth=0 blks
>> >>>>naming   =version 2              bsize=4096   ascii-ci=0
>> >>>>log      =internal log           bsize=4096   blocks=12800, version=2
>> >>>>         =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> >>>>realtime =none                   extsz=4096   blocks=0, rtextents=0
>> >>>>
>> >>>>real 9m10.882s
>> >>>>user 0m0.000s
>> >>>>sys 0m0.012s
>> >>>>
>> >>>>When we issue this same command with object-map feature disabled on the
>> >>>>image it completes much faster.
>> >>>>
>> >>>># time mkfs.xfs -s size=4096 -f /dev/sda
>> >>>>meta-data=/dev/sda               isize=256    agcount=4, agsize=6553600
>> >>>>blks
>> >>>>         =                       sectsz=4096  attr=2, projid32bit=0
>> >>>>data     =                       bsize=4096   blocks=26214400, imaxpct=25
>> >>>>         =                       sunit=0      swidth=0 blks
>> >>>>naming   =version 2              bsize=4096   ascii-ci=0
>> >>>>log      =internal log           bsize=4096   blocks=12800, version=2
>> >>>>         =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> >>>>realtime =none                   extsz=4096   blocks=0, rtextents=0
>> >>>>
>> >>>>real 0m1.780s
>> >>>>user 0m0.000s
>> >>>>sys 0m0.012s
>> >>>>
>> >>>>Also from what I am seeing the slowness seems to be proportional to the
>> >>>>size
>> >>>>of the image rather than the amount of data written into it. Issuing mkfs
>> >>>>without discard doesn't reproduce this issue. The above values were for
>> >>>>100G
>> >>>>rbd image. The 250G takes slightly more than twice the time taken for
>> >>>>100G
>> >>>>one.
>> >>>>
>> >>>># time mkfs.xfs -s size=4096 -f /dev/sda
>> >>>>meta-data=/dev/sda               isize=256    agcount=4, agsize=16384000
>> >>>>blks
>> >>>>         =                       sectsz=4096  attr=2, projid32bit=0
>> >>>>data     =                       bsize=4096   blocks=65536000, imaxpct=25
>> >>>>         =                       sunit=0      swidth=0 blks
>> >>>>naming   =version 2              bsize=4096   ascii-ci=0
>> >>>>log      =internal log           bsize=4096   blocks=32000, version=2
>> >>>>         =                       sectsz=4096  sunit=1 blks, lazy-count=1
>> >>>>realtime =none                   extsz=4096   blocks=0, rtextents=0
>> >>>>
>> >>>>real 22m58.076s
>> >>>>user 0m0.000s
>> >>>>sys 0m0.024s
>> >>>>
>> >>>>Let me know if you need any more information regarding this. We would
>> >>>>like
>> >>>>to enable object-map (and fast-diff) on our images once this gets
>> >>>>resolved.
>> >>>>
>> >>>>
>> >>>>On Wed, Jun 22, 2016 at 5:39 PM, Jason Dillaman <jdillama@xxxxxxxxxx>
>> >>>>wrote:
>> >>>>>
>> >>>>>
>> >>>>>I'm not sure why I never received the original list email, so I
>> >>>>>apologize for the delay. Is /dev/sda1, from your example, fresh with
>> >>>>>no data to actually discard or does it actually have lots of data to
>> >>>>>discard?
>> >>>>>
>> >>>>>Thanks,
>> >>>>>
>> >>>>>On Wed, Jun 22, 2016 at 1:56 PM, Brian Andrus <bandrus@xxxxxxxxxx>
>> >>>>>wrote:
>> >>>>>> I've created a downstream bug for this same issue.
>> >>>>>>
>> >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1349116
>> >>>>>>
>> >>>>>> On Wed, Jun 15, 2016 at 6:23 AM, <list@xxxxxxxxxxxxxxx> wrote:
>> >>>>>>>
>> >>>>>>> Hello guys,
>> >>>>>>>
>> >>>>>>> We are currently testing Ceph Jewel with object-map feature enabled:
>> >>>>>>>
>> >>>>>>> rbd image 'disk-22920':
>> >>>>>>>         size 102400 MB in 25600 objects
>> >>>>>>>         order 22 (4096 kB objects)
>> >>>>>>>         block_name_prefix: rbd_data.7cfa2238e1f29
>> >>>>>>>         format: 2
>> >>>>>>>         features: layering, exclusive-lock, object-map, fast-diff,
>> >>>>>>> deep-flatten
>> >>>>>>>         flags:
>> >>>>>>>
>> >>>>>>> We use this RBD as disk for a kvm virtual machine with virtio-scsi
>> >>>>>>> and
>> >>>>>>> discard=unmap. We noticed the following paremeters in /sys/block:
>> >>>>>>>
>> >>>>>>> # cat /sys/block/sda/queue/discard_*
>> >>>>>>> 4096
>> >>>>>>> 1073741824
>> >>>>>>> 0 <- discard_zeroes_data
>> >>>>>>>
>> >>>>>>> While trying to do a mkfs.ext4 on the disk in VM we noticed a low
>> >>>>>>> performance with using discard.
>> >>>>>>>
>> >>>>>>> mkfs.ext4 -E nodiscard /dev/sda1 - tooks 5 seconds to complete
>> >>>>>>> mkfs.ext4 -E discard /dev/sda1 - tooks around 3 monutes
>> >>>>>>>
>> >>>>>>> When disabling the object-map the mkfs with discard tooks just 5
>> >>>>>>> seconds.
>> >>>>>>>
>> >>>>>>> Do you have any idea what might cause this issue?
>> >>>>>>>
>> >>>>>>> Kernel: 4.2.0-35-generic #40~14.04.1-Ubuntu
>> >>>>>>> Ceph: 10.2.0
>> >>>>>>> Libvirt: 1.3.1
>> >>>>>>> QEMU: 2.5.0
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>> Jonas
>> >>>>>>> _______________________________________________
>> >>>>>>> ceph-users mailing list
>> >>>>>>> ceph-users@xxxxxxxxxxxxxx
>> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Brian Andrus
>> >>>>>> Red Hat, Inc.
>> >>>>>> Storage Consultant, Global Storage Practice
>> >>>>>> Mobile +1 (530) 903-8487
>> >>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> ceph-users mailing list
>> >>>>>> ceph-users@xxxxxxxxxxxxxx
>> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>--
>> >>>>>Jason
>> >>>>>_______________________________________________
>> >>>>>ceph-users mailing list
>> >>>>>ceph-users@xxxxxxxxxxxxxx
>> >>>>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>--
>> >>>Jason
>> >>
>> >>
>> >>--
>> >>Vaibhav Bhembre
>> >
>> >
>> >
>> >--
>> >Jason
>>
>> --
>> Vaibhav Bhembre



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux