Re: blk-mq request allocation stalls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/12/2015 07:46 AM, Bart Van Assche wrote:
On 01/10/15 04:10, Mike Snitzer wrote:
On Fri, Jan 09 2015 at  8:59pm -0500,
Jens Axboe <axboe@xxxxxxxxx> wrote:
Bart, could you try the patch (the -v4) and your DM hang and see if
it solves it for you?

Yes, I'm interested to hear from Bart on v4 too.

Hello Mike and Jens,

Sorry but even with v4 applied filesystem creation still takes too long.
The kernel I have been testing with was generated as follows:
* Started from Mike's dm-for-3.20-blk-mq branch.
* Merged v3.19-rc4 with this branch.
* Applied Jens' blk-mq tag patch and Mike's debug patch on top.
* Modified Mike's patch to make it print the blk-mq "may_queue" state
   (hctx_may_queue(hctx, bt)).

Here are the results without multipath:

# systemctl disable multipathd
# systemctl stop multipathd
# dmsetup remove_all
# rmmod dm_service_time
# rmmod dm_multipath
# rmmod dm_mod
# time mkfs.xfs -f /dev/sdc >/dev/null
real    0m0.037s
user    0m0.000s
sys     0m0.020s
# time mkfs.xfs -f /dev/sdd >/dev/null
real    0m0.030s
user    0m0.010s
sys     0m0.010s

With multipath:

# ls -l /dev/sd[cd]
brw-rw---- 1 root disk 8, 32 Jan 12 15:09 /dev/sdc
brw-rw---- 1 root disk 8, 48 Jan 12 15:11 /dev/sdd
# systemctl start multipathd
# dmsetup table /dev/dm-0
0 256000 multipath 3 queue_if_no_path pg_init_retries 50 0 1 1
service-time 0 2 2 8:48 1 1 8:32 1 1
# time mkfs.xfs -f /dev/dm-0 >/dev/null
real    0m8.845s
user    0m0.000s
sys     0m0.020s
# time mkfs.xfs -f /dev/dm-0 >/dev/null
real    0m14.905s
user    0m0.000s
sys     0m0.020s

What is remarkable is that Mike's debug patch started to report
"bt_get() returned -1" as soon as multipathd was started. The first of
many identical call traces printed by this debug patch was as follows:

bt_get: __bt_get() returned -1
queue_num=2, nr_tags=62, reserved_tags=0, bits_per_word=3
nr_free=62, nr_reserved=0, may_queue=0
active_queues=8

Can you add dumping of hctx->nr_active when this fails? You case is that the may_queue logic says no-can-do, so it smells like the nr_active accounting is wonky since you have supposedly no allocated tags, yet it clearly thinks that you do.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel



[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux