Re: mkfs.ext4 hang on RBD volume

Vincent Godin <vince.mlist@xxxxxxxxx> · Mon, 16 Jan 2017 17:34:18 +0100

In fact, we can reproduce the problem from VM with CentOS 6.7, 7.2 or 7.3. We can reproduce it each time with this config : one VM (here in CentOS 6.7) with 16 RBD volumes of 100GB attached. When we launch in serial mkfs.ext4 on each of these volumes, we allways encounter the problem on one of them. We tried with the option -E nodiscard but we still have the problem. It' look exactly like the bug #9071 with the same dmesg message :

 vdh: unknown partition table
EXT4-fs (vdf): mounted filesystem with ordered data mode. Opts:
EXT4-fs (vdg): mounted filesystem with ordered data mode. Opts:
INFO: task flush-252:112:2903 blocked for more than 120 seconds.
      Not tainted 2.6.32-573.18.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-252:112 D 0000000000000000     0  2903      2 0x00000080
 ffff8808328bf6e0 0000000000000046 ffff8808ffffffff 000000003d697f73
 0000000000000000 ffff88082fbd7ec0 0000000000021454 ffffffffa78356ec
 000000002b9db4fe ffffffff81aa6700 ffff88082efc9ad8 ffff8808328bffd8
Call Trace:
 [<ffffffff81539673>] io_schedule+0x73/0xc0
 [<ffffffff81276598>] get_request_wait+0x108/0x1d0
 [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff812766f9>] blk_queue_bio+0x99/0x610
 [<ffffffff81274ec0>] generic_make_request+0x240/0x5a0
 [<ffffffff81129cf5>] ? mempool_alloc_slab+0x15/0x20
 [<ffffffff81129e93>] ? mempool_alloc+0x63/0x140
 [<ffffffff81275290>] submit_bio+0x70/0x120
 [<ffffffff811c7dcd>] submit_bh+0x11d/0x1f0
 [<ffffffff811ca588>] __block_write_full_page+0x1c8/0x330
 [<ffffffff811c9550>] ? end_buffer_async_write+0x0/0x190
 [<ffffffff811ce450>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811ce450>] ? blkdev_get_block+0x0/0x20
 [<ffffffff811ca7d0>] block_write_full_page_endio+0xe0/0x120
 [<ffffffff81126ff0>] ? find_get_pages_tag+0x40/0x130
 [<ffffffff811ca825>] block_write_full_page+0x15/0x20
 [<ffffffff811cf5e8>] blkdev_writepage+0x18/0x20
 [<ffffffff8113b387>] __writepage+0x17/0x40
 [<ffffffff8113c64d>] write_cache_pages+0x1fd/0x4c0
 [<ffffffff8113b370>] ? __writepage+0x0/0x40
 [<ffffffff8113c934>] generic_writepages+0x24/0x30
 [<ffffffff8113c961>] do_writepages+0x21/0x40
 [<ffffffff811bf01d>] writeback_single_inode+0xdd/0x290
 [<ffffffff811bf41d>] writeback_sb_inodes+0xbd/0x170
 [<ffffffff811bf57b>] writeback_inodes_wb+0xab/0x1b0
 [<ffffffff811bf973>] wb_writeback+0x2f3/0x410
 [<ffffffff811bfb4b>] wb_do_writeback+0xbb/0x240
 [<ffffffff811bfd33>] bdi_writeback_task+0x63/0x1b0
 [<ffffffff810a12e7>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff8114b760>] ? bdi_start_fn+0x0/0x100
 [<ffffffff8114b7e6>] bdi_start_fn+0x86/0x100
 [<ffffffff8114b760>] ? bdi_start_fn+0x0/0x100
 [<ffffffff810a0fce>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a0f30>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20
INFO: task mkfs.ext4:3040 blocked for more than 120 seconds.
      Not tainted 2.6.32-573.18.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mkfs.ext4     D 0000000000000002     0  3040   3038 0x00000080
 ffff88075e79f4d8 0000000000000082 ffff8808ffffffff 000000003d697f73
 0000000000000000 ffff88082fb73130 0000000000021472 ffffffffa78356ec
 000000002b9db4fe ffffffff81aa6700 ffff88082e787068 ffff88075e79ffd8
Call Trace:
 [<ffffffff81539673>] io_schedule+0x73/0xc0
 [<ffffffff81276598>] get_request_wait+0x108/0x1d0
 [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff812766f9>] blk_queue_bio+0x99/0x610

Ceph version is Jewel 10.2.3
Ceph clients, mons and servers have the kernel  3.10.0-327.36.3.el7.x86_64 on CentOS 7.2

2017-01-13 20:07 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:
You might be hitting this issue [1] where mkfs is issuing lots of

discard operations. If you get a chance, can you retest w/ the "-E

nodiscard" option?

Thanks

[1] http://tracker.ceph.com/issues/16689

On Fri, Jan 13, 2017 at 12:57 PM, Vincent Godin <vince.mlist@xxxxxxxxx> wrote:

> Thanks Jason,

>

> We observed a curious behavior : we have some VMs on CentOS 6.x hosted on

> our Openstack computes which are in CentOS 7.2. If we try to make a

> mkfs.ext4 on a volume create with the Jewel default (61) on the VM it's hung

> and we have to reboot the VM to get a responsive system. This is strange

> because the libvirt process is launched from the host which is in CentOS

> 7.2. If a disable some features, the mkfs.ext4 succeed. If the VM is in

> CentOS 7.x, there is no probleme at all. Maybe the kernel of the CentOS 6.X

> is unable to use the exclusive-lock feature ?

> I think we will have to stay in a very conservative rbd_default_features

> such 1 because we don't use stripping and the others features are not

> compatible with our old CentOS 6.x VMs ..

>

> A last question : is the rbd object-map rebuild a long process ? in an other

> way, does it cost the same time as a delete (which read all the blocks

> possible for an image without omap feature). Is it a good idea to enable

> omap feature on an already used image ? (I know that during the rebuild

> process, the VM will have to be stopped)

>

>

>

> 2017-01-13 15:09 GMT+01:00 Jason Dillaman <jdillama@xxxxxxxxxx>:

>>

>> On Fri, Jan 13, 2017 at 5:11 AM, Vincent Godin <vince.mlist@xxxxxxxxx>

>> wrote:

>> > We are using a production cluster which started in Firefly, then moved

>> > to

>> > Giant, Hammer and finally Jewel. So our images have different features

>> > correspondind to the value of "rbd_default_features" of the version when

>> > they were created.

>> > We have actually three pack of features activated :

>> > image with :

>> > - layering ~ 1

>> > - layering, striping ~3

>> > - layering, exclusive-lock, object-map, fast-diff, deep-flatten ~ 61

>> >

>> > 1) Is it a good idea to try to give all images the same features ?

>>

>> It isn't needed.

>>

>> > 2) Is it possible to disable the striping feature on an already created

>> > image (we never specify any stripe-unit nor stripe-count) ?

>>

>> Negative -- striping cannot be dynamically disabled because it would

>> result in potentially altering the structure and placement of the data

>> within the image. If your stripe-unit is the object size and the

>> stripe count is 1, that's a special case where the flag is essentially

>> ignored.

>>

>> > 3) What is the behaviour of an already created image on which we

>> > activate

>> > the object-map feature ? Will a process try to rebuild a index of used

>> > blocks - if no, if we delete later the image, will ceph try to remove

>> > all

>> > the blocks or only the blocks refered by object-map index ?

>>

>> You would need to run "rbd object-map rebuild <image-spec>" to rebuild

>> the object map. Until it is rebuilt, it will be considered invalid and

>> won't be used for reference. You can determine the object map state by

>> running "rbd info <image-spec>"

>>

>> > 4) We are on Jewel but with tunables set to hammer (Centos 7.2). What

>> > are

>> > the best default features to set in that case ? (we use Ceph  under an

>> > Openstack for glance, nova and cinder

>>

>> We feel like the current defaults are a good mix of features for

>> everyday use of non-shared images or non-krbd images. Most

>> importantly, all the default features can be dynamically disabled if

>> your needs for the image change.

>>

>> >

>> >

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>>

>>

>>

>> --

>> Jason

>

>

--

Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com