We found the issue. It was a simply "max open files" on user qemu which was reached. When we do in serial a lot of mkfs, there is a lot of sockets
open to ceph backend and qemu reach its max open files limit. So we increased max open files in qemu.conf and the problem disapeared
open to ceph backend and qemu reach its max open files limit. So we increased max open files in qemu.conf and the problem disapeared
2017-01-16 19:19 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:
Can you ensure that you have the "admin socket" configured for your
librbd-backed VM so that you can do the following when you hit that
condition:
ceph --admin-daemon <path to librbd asok file> objecter_requests
That will dump out any hung IO requests between librbd and the OSDs. I
would also check your librbd logs to see if you are seeing an error
like "heartbeat_map is_healthy 'tp_librbd thread tp_librbd' had timed
out after 60" being logged periodically, which would indicate a thread
deadlock within librbd.
--
On Mon, Jan 16, 2017 at 1:12 PM, Vincent Godin <vince.mlist@xxxxxxxxx> wrote:
> We are using librbd on a host with CentOS 7.2 via virtio-blk. This server
> hosts the VMs on which we are doing our tests. But we have exactly the same
> behaviour than #9071. We try to follow the thread to the bug 8818 but we
> didn't reproduce the issue with a lot of DD. Each time we try with
> mkfs.ext4, there is always one process over the 16 (we have 16 volumes)
> which hangs !
>
> 2017-01-16 17:45 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:
>>
>> Are you using krbd directly within the VM or librbd via
>> virtio-blk/scsi? Ticket #9071 is against krbd.
>>
>> On Mon, Jan 16, 2017 at 11:34 AM, Vincent Godin <vince.mlist@xxxxxxxxx>
>> wrote:
>> > In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or
>> > 7.3.
>> > We can reproduce it each time with this config : one VM (here in CentOS
>> > 6.7)
>> > with 16 RBD volumes of 100GB attached. When we launch in serial
>> > mkfs.ext4 on
>> > each of these volumes, we allways encounter the problem on one of them.
>> > We
>> > tried with the option -E nodiscard but we still have the problem. It'
>> > look
>> > exactly like the bug #9071 with the same dmesg message :
>> >
>> > vdh: unknown partition table
>> > EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts:
>> > EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts:
>> > INFO: task flush-252:112:2903 blocked for more than 120 seconds.
>> > Not tainted 2.6.32-573.18.1.el6.x86_64 #1
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> > message.
>> > flush-252:112 D 0000000000000000 0 2903 2 0x00000080
>> > ffff8808328bf6e0 0000000000000046 ffff8808ffffffff 000000003d697f73
>> > 0000000000000000 ffff88082fbd7ec0 0000000000021454 ffffffffa78356ec
>> > 000000002b9db4fe ffffffff81aa6700 ffff88082efc9ad8 ffff8808328bffd8
>> > Call Trace:
>> > [<ffffffff81539673>] io_schedule+0x73/0xc0
>> > [<ffffffff81276598>] get_request_wait+0x108/0x1d0
>> > [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
>> > [<ffffffff812766f9>] blk_queue_bio+0x99/0x610
>> > [<ffffffff81274ec0>] generic_make_request+0x240/0x5a0
>> > [<ffffffff81129cf5>] ? mempool_alloc_slab+0x15/0x20
>> > [<ffffffff81129e93>] ? mempool_alloc+0x63/0x140
>> > [<ffffffff81275290>] submit_bio+0x70/0x120
>> > [<ffffffff811c7dcd>] submit_bh+0x11d/0x1f0
>> > [<ffffffff811ca588>] __block_write_full_page+0x1c8/0x330
>> > [<ffffffff811c9550>] ? end_buffer_async_write+0x0/0x190
>> > [<ffffffff811ce450>] ? blkdev_get_block+0x0/0x20
>> > [<ffffffff811ce450>] ? blkdev_get_block+0x0/0x20
>> > [<ffffffff811ca7d0>] block_write_full_page_endio+0xe0/0x120
>> > [<ffffffff81126ff0>] ? find_get_pages_tag+0x40/0x130
>> > [<ffffffff811ca825>] block_write_full_page+0x15/0x20
>> > [<ffffffff811cf5e8>] blkdev_writepage+0x18/0x20
>> > [<ffffffff8113b387>] __writepage+0x17/0x40
>> > [<ffffffff8113c64d>] write_cache_pages+0x1fd/0x4c0
>> > [<ffffffff8113b370>] ? __writepage+0x0/0x40
>> > [<ffffffff8113c934>] generic_writepages+0x24/0x30
>> > [<ffffffff8113c961>] do_writepages+0x21/0x40
>> > [<ffffffff811bf01d>] writeback_single_inode+0xdd/0x290
>> > [<ffffffff811bf41d>] writeback_sb_inodes+0xbd/0x170
>> > [<ffffffff811bf57b>] writeback_inodes_wb+0xab/0x1b0
>> > [<ffffffff811bf973>] wb_writeback+0x2f3/0x410
>> > [<ffffffff811bfb4b>] wb_do_writeback+0xbb/0x240
>> > [<ffffffff811bfd33>] bdi_writeback_task+0x63/0x1b0
>> > [<ffffffff810a12e7>] ? bit_waitqueue+0x17/0xd0
>> > [<ffffffff8114b760>] ? bdi_start_fn+0x0/0x100
>> > [<ffffffff8114b7e6>] bdi_start_fn+0x86/0x100
>> > [<ffffffff8114b760>] ? bdi_start_fn+0x0/0x100
>> > [<ffffffff810a0fce>] kthread+0x9e/0xc0
>> > [<ffffffff8100c28a>] child_rip+0xa/0x20
>> > [<ffffffff810a0f30>] ? kthread+0x0/0xc0
>> > [<ffffffff8100c280>] ? child_rip+0x0/0x20
>> > INFO: task mkfs.ext4:3040 blocked for more than 120 seconds.
>> > Not tainted 2.6.32-573.18.1.el6.x86_64 #1
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> > message.
>> > mkfs.ext4 D 0000000000000002 0 3040 3038 0x00000080
>> > ffff88075e79f4d8 0000000000000082 ffff8808ffffffff 000000003d697f73
>> > 0000000000000000 ffff88082fb73130 0000000000021472 ffffffffa78356ec
>> > 000000002b9db4fe ffffffff81aa6700 ffff88082e787068 ffff88075e79ffd8
>> > Call Trace:
>> > [<ffffffff81539673>] io_schedule+0x73/0xc0
>> > [<ffffffff81276598>] get_request_wait+0x108/0x1d0
>> > [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
>> > [<ffffffff812766f9>] blk_queue_bio+0x99/0x610
>> >
>> > Ceph version is Jewel 10.2.3
>> > Ceph clients, mons and servers have the kernel
>> > 3.10.0-327.36.3.el7.x86_64
>> > on CentOS 7.2
>> >
>> > 2017-01-13 20:07 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:
>> >>
>> >> You might be hitting this issue [1] where mkfs is issuing lots of
>> >> discard operations. If you get a chance, can you retest w/ the "-E
>> >> nodiscard" option?
>> >>
>> >> Thanks
>> >>
>> >> [1] http://tracker.ceph.com/issues/16689
>> >>
>> >> On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin <vince.mlist@xxxxxxxxx>
>> >> wrote:
>> >> > Thanks Jason,
>> >> >
>> >> > We observed a curious behavior : we have some VMs on CentOS 6.x
>> >> > hosted
>> >> > on
>> >> > our Openstack computes which are in CentOS 7.2. If we try to make a
>> >> > mkfs.ext4 on a volume create with the Jewel default (61) on the VM
>> >> > it's
>> >> > hung
>> >> > and we have to reboot the VM to get a responsive system. This is
>> >> > strange
>> >> > because the libvirt process is launched from the host which is in
>> >> > CentOS
>> >> > 7.2. If a disable some features, the mkfs.ext4 succeed. If the VM is
>> >> > in
>> >> > CentOS 7.x, there is no probleme at all. Maybe the kernel of the
>> >> > CentOS
>> >> > 6.X
>> >> > is unable to use the exclusive-lock feature ?
>> >> > I think we will have to stay in a very conservative
>> >> > rbd_default_features
>> >> > such 1 because we don't use stripping and the others features are not
>> >> > compatible with our old CentOS 6.x VMs ..
>> >> >
>> >> > A last question : is the rbd object-map rebuild a long process ? in
>> >> > an
>> >> > other
>> >> > way, does it cost the same time as a delete (which read all the
>> >> > blocks
>> >> > possible for an image without omap feature). Is it a good idea to
>> >> > enable
>> >> > omap feature on an already used image ? (I know that during the
>> >> > rebuild
>> >> > process, the VM will have to be stopped)
>> >> >
>> >> >
>> >> >
>> >> > 2017-01-13 15:09 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:
>> >> >>
>> >> >> On Fri, Jan 13, 2017 at 5:11 AM, Vincent Godin
>> >> >> <vince.mlist@xxxxxxxxx>
>> >> >> wrote:
>> >> >> > We are using a production cluster which started in Firefly, then
>> >> >> > moved
>> >> >> > to
>> >> >> > Giant, Hammer and finally Jewel. So our images have different
>> >> >> > features
>> >> >> > correspondind to the value of "rbd_default_features" of the
>> >> >> > version
>> >> >> > when
>> >> >> > they were created.
>> >> >> > We have actually three pack of features activated :
>> >> >> > image with :
>> >> >> > - layering ~ 1
>> >> >> > - layering, striping ~3
>> >> >> > - layering, exclusive-lock, object-map, fast-diff, deep-flatten ~
>> >> >> > 61
>> >> >> >
>> >> >> > 1) Is it a good idea to try to give all images the same features ?
>> >> >>
>> >> >> It isn't needed.
>> >> >>
>> >> >> > 2) Is it possible to disable the striping feature on an already
>> >> >> > created
>> >> >> > image (we never specify any stripe-unit nor stripe-count) ?
>> >> >>
>> >> >> Negative -- striping cannot be dynamically disabled because it would
>> >> >> result in potentially altering the structure and placement of the
>> >> >> data
>> >> >> within the image. If your stripe-unit is the object size and the
>> >> >> stripe count is 1, that's a special case where the flag is
>> >> >> essentially
>> >> >> ignored.
>> >> >>
>> >> >> > 3) What is the behaviour of an already created image on which we
>> >> >> > activate
>> >> >> > the object-map feature ? Will a process try to rebuild a index of
>> >> >> > used
>> >> >> > blocks - if no, if we delete later the image, will ceph try to
>> >> >> > remove
>> >> >> > all
>> >> >> > the blocks or only the blocks refered by object-map index ?
>> >> >>
>> >> >> You would need to run "rbd object-map rebuild <image-spec>" to
>> >> >> rebuild
>> >> >> the object map. Until it is rebuilt, it will be considered invalid
>> >> >> and
>> >> >> won't be used for reference. You can determine the object map state
>> >> >> by
>> >> >> running "rbd info <image-spec>"
>> >> >>
>> >> >> > 4) We are on Jewel but with tunables set to hammer (Centos 7.2).
>> >> >> > What
>> >> >> > are
>> >> >> > the best default features to set in that case ? (we use Ceph
>> >> >> > under
>> >> >> > an
>> >> >> > Openstack for glance, nova and cinder
>> >> >>
>> >> >> We feel like the current defaults are a good mix of features for
>> >> >> everyday use of non-shared images or non-krbd images. Most
>> >> >> importantly, all the default features can be dynamically disabled if
>> >> >> your needs for the image change.
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@xxxxxxxxxxxxxx
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jason
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jason
>> >
>> >
>>
>>
>>
>> --
>> Jason
>
>
Jason
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com