Re: [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 18 Nov 2019 15:27:16 -0800

On 11/16/19 8:12 PM, Ming Lei wrote:
On Sat, Nov 16, 2019 at 05:24:05PM -0800, Bart Van Assche wrote:
On 2019-11-15 23:17, Ming Lei wrote:
Now blk-mq takes a static queue mapping between CPU and hw queues, given
CPU hotplug may happen any time, so the specified hw queue may become
inactive any time.

I can trigger a race between blk_mq_alloc_request_hctx() and
CPU-hotplugging by running blktests. The patch below fixes that race
on my setup. Does this patch also fix the race(s) that you ran into?

The following problem has been triggered in my regular test for years,
is it same with yours?

[ 2248.751675] nvme nvme1: creating 2 I/O queues.
[ 2248.752351] BUG: unable to handle page fault for address: 0000607d064434a8
[ 2248.753348] #PF: supervisor write access in kernel mode
[ 2248.754106] #PF: error_code(0x0002) - not-present page
[ 2248.754846] PGD 0 P4D 0
[ 2248.755230] Oops: 0002 [#1] PREEMPT SMP PTI
[ 2248.755838] CPU: 7 PID: 16293 Comm: kworker/u18:3 Not tainted 5.4.0-rc7_96b95eff4a59_master+ #1
[ 2248.757089] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 2248.758863] Workqueue: nvme-reset-wq nvme_loop_reset_ctrl_work [nvme_loop]
[ 2248.759857] RIP: 0010:blk_mq_get_request+0x2a8/0x31c
[ 2248.760654] Code: c7 83 08 01 00 00 00 00 00 00 48 c7 83 10 01 00 00 00 00 00 00 48 8b 55 18 45 84 ed 74 0c 31 c0 41 81 e5 00 08 06 00 0f 95 c0 <48> ff 44 c2 68 c7 83 d4 00 00 00 01 00 00 00 f7 45 10 00 00 06 00
[ 2248.763375] RSP: 0018:ffffc900012dbc80 EFLAGS: 00010246
[ 2248.764130] RAX: 0000000000000000 RBX: ffff888170d70000 RCX: 0000000000000017
[ 2248.765156] RDX: 0000607d06443440 RSI: 0000020bb36c554e RDI: 0000020bb3837c3f
[ 2248.766034] RBP: ffffc900012dbcc0 R08: 00000000f461df07 R09: 00000000000000a8
[ 2248.767084] R10: ffffc900012dbe50 R11: 0000000000000002 R12: 0000000000000000
[ 2248.768109] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2248.769134] FS:  0000000000000000(0000) GS:ffff88827bd80000(0000) knlGS:0000000000000000
[ 2248.770294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2248.771125] CR2: 0000607d064434a8 CR3: 0000000272866001 CR4: 0000000000760ee0
[ 2248.772152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2248.773179] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2248.774204] PKRU: 55555554
[ 2248.774603] Call Trace:
[ 2248.774983]  blk_mq_alloc_request_hctx+0xc5/0x10e
[ 2248.775674]  nvme_alloc_request+0x42/0x71
[ 2248.776263]  __nvme_submit_sync_cmd+0x49/0x1b2
[ 2248.776910]  nvmf_connect_io_queue+0x12c/0x195 [nvme_fabrics]
[ 2248.777663]  ? nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.778481]  nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.779325]  nvme_loop_reset_ctrl_work+0x62/0xd4 [nvme_loop]
[ 2248.780144]  process_one_work+0x1a8/0x2a1
[ 2248.780727]  ? process_scheduled_works+0x2c/0x2c
[ 2248.781398]  process_scheduled_works+0x27/0x2c
[ 2248.782046]  worker_thread+0x1b1/0x23f
[ 2248.782594]  kthread+0xf5/0xfa
[ 2248.783048]  ? kthread_unpark+0x62/0x62
[ 2248.783608]  ret_from_fork+0x35/0x40

Hi Ming,

Thanks for having shared this call stack. What made me look at 
blk_mq_alloc_request_hctx() is a different issue, namely the following 
kernel warnings, reported from inside blk_mq_run_hw_queue():

WARNING: CPU: 0 PID: 6123 at include/linux/cpumask.h:137 
cpumask_next_and+0x16/0x30
WARNING: CPU: 0 PID: 323 at include/linux/cpumask.h:137 
__blk_mq_run_hw_queue+0x152/0x1b0

Bart.