On 10/22/18 9:21 AM, Benjamin Block wrote: > On Mon, Oct 22, 2018 at 06:38:36AM -0600, Jens Axboe wrote: >> On 10/22/18 4:03 AM, Benjamin Block wrote: >>> On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote: >>>> On 10/19/18 9:01 AM, Benjamin Block wrote: >>>>> On Wed, Oct 17, 2018 at 10:01:16AM -0600, Jens Axboe wrote: >>>>>> On 10/17/18 9:55 AM, Benjamin Block wrote: >>>>>>> On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote: >>>>>>>> Requires a few changes to the FC transport class as well. >>>>>>>> >>>>>>>> Cc: Johannes Thumshirn <jthumshirn@xxxxxxx> >>>>>>>> Cc: Benjamin Block <bblock@xxxxxxxxxxxxxxxxxx> >>>>>>>> Cc: linux-scsi@xxxxxxxxxxxxxxx >>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>>>> --- >>>>>>>> block/bsg-lib.c | 102 +++++++++++++++++-------------- >>>>>>>> drivers/scsi/scsi_transport_fc.c | 61 ++++++++++-------- >>>>>>>> 2 files changed, 91 insertions(+), 72 deletions(-) >>>>>>>> >>>> >>>> but that's not going to apply cleanly... Can you just try and run my >>>> mq-conversions branch? That has everything, and it also has that >>>> alloc failure fixed. >>>> >>>> git://git.kernel.dk/linux-block mq-conversions >>>> >>> >>> Ok so, that gets past the stage where we initialize the queues. Simple >>> SCSI-I/O also seems to work, that is for example an INQUIRY(10), but >>> transport commands that get passed to the driver break. Tried to send >>> a FibreChannel GPN_FT (remote port discovery). >>> >>> As the BSG interface goes. This is a bidirectional command, that has >>> both a buffer for the request and for the reply. AFAIR BSG will create a >>> struct request for each of them. Protocol is BSG_PROTOCOL_SCSI, >>> Subprotocol BSG_SUB_PROTOCOL_SCSI_TRANSPORT. The rest should be >>> transparent till we get into the driver. >>> >>> First got this: >>> >>> [ 566.531100] BUG: sleeping function called from invalid context at mm/slab.h:421 >>> [ 566.531452] in_atomic(): 1, irqs_disabled(): 0, pid: 3104, name: bsg_api_test >>> [ 566.531460] 1 lock held by bsg_api_test/3104: >>> [ 566.531466] #0: 00000000cb4b58e8 (rcu_read_lock){....}, at: hctx_lock+0x30/0x118 >>> [ 566.531498] Preemption disabled at: >>> [ 566.531503] [<00000000008175d0>] __blk_mq_delay_run_hw_queue+0x50/0x218 >>> [ 566.531519] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: G W 4.19.0-rc6-bb-next+ #1 >>> [ 566.531527] Hardware name: IBM 3906 M03 704 (LPAR) >>> [ 566.531533] Call Trace: >>> [ 566.531544] ([<00000000001167fa>] show_stack+0x8a/0xd8) >>> [ 566.531555] [<0000000000bcc6d2>] dump_stack+0x9a/0xd8 >>> [ 566.531565] [<0000000000196410>] ___might_sleep+0x280/0x298 >>> [ 566.531576] [<00000000003e528c>] __kmalloc+0xbc/0x560 >>> [ 566.531584] [<000000000083186a>] bsg_map_buffer+0x5a/0xb0 >>> [ 566.531591] [<0000000000831948>] bsg_queue_rq+0x88/0x118 >>> [ 566.531599] [<000000000081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670 >>> [ 566.531607] [<000000000082050e>] blk_mq_do_dispatch_sched+0x11e/0x130 >>> [ 566.531615] [<0000000000820dfe>] blk_mq_sched_dispatch_requests+0x156/0x1a0 >>> [ 566.531622] [<0000000000817564>] __blk_mq_run_hw_queue+0x144/0x160 >>> [ 566.531630] [<0000000000817614>] __blk_mq_delay_run_hw_queue+0x94/0x218 >>> [ 566.531638] [<00000000008178b2>] blk_mq_run_hw_queue+0xda/0xf0 >>> [ 566.531645] [<00000000008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8 >>> [ 566.531653] [<0000000000811ee2>] blk_execute_rq_nowait+0x72/0x80 >>> [ 566.531660] [<0000000000811f66>] blk_execute_rq+0x76/0xb8 >>> [ 566.531778] [<0000000000830d0e>] bsg_ioctl+0x426/0x500 >>> [ 566.531787] [<0000000000440cb4>] do_vfs_ioctl+0x68c/0x710 >>> [ 566.531794] [<0000000000440dac>] ksys_ioctl+0x74/0xa0 >>> [ 566.531801] [<0000000000440e0a>] sys_ioctl+0x32/0x40 >>> [ 566.531808] [<0000000000bf1dd0>] system_call+0xd8/0x2d0 >>> [ 566.531815] 1 lock held by bsg_api_test/3104: >>> [ 566.531821] #0: 00000000cb4b58e8 (rcu_read_lock){....}, at: hctx_lock+0x30/0x118 >>> >>> And then it dies completely: >>> >>> [ 566.531854] Unable to handle kernel pointer dereference in virtual kernel address space >>> [ 566.531861] Failing address: 0000000000000000 TEID: 0000000000000483 >>> [ 566.531867] Fault in home space mode while using kernel ASCE. >>> [ 566.531885] AS:00000000013ec007 R3:00000000effc8007 S:00000000effce000 P:000000000000013d >>> [ 566.531927] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC >>> [ 566.531938] Modules linked in: ... >>> [ 566.532042] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: G W 4.19.0-rc6-bb-next+ #1 >>> [ 566.532047] Hardware name: IBM 3906 M03 704 (LPAR) >>> [ 566.532051] Krnl PSW : 00000000d16c67b2 00000000e4a74b5c (zfcp_fc_exec_bsg_job+0x116/0x2c0 [zfcp]) >>> [ 566.532071] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 >>> [ 566.532077] Krnl GPRS: 0000000000001000 00000000bfb84178 0000000000000001 0000000080000004 >>> [ 566.532082] 0000000000001000 00000000a6625108 0000000000000000 0000000000000001 >>> [ 566.532086] 00000000bfb870e8 0000000000000000 00000000b6276930 00000000bb3a6fc8 >>> [ 566.532091] 00000000b6276800 000003ff80306228 000003ff802fc048 00000000a7313830 >>> [ 566.532104] Krnl Code: 000003ff802fc090: a7740004 brc 7,3ff802fc098 >>> [ 566.532104] 000003ff802fc094: a7f4002e brc 15,3ff802fc0f0 >>> [ 566.532104] #000003ff802fc098: e310a0300004 lg %r1,48(%r10) >>> [ 566.532104] >000003ff802fc09e: e31090000024 stg %r1,0(%r9) >>> [ 566.532104] 000003ff802fc0a4: e310a0400004 lg %r1,64(%r10) >>> [ 566.532104] 000003ff802fc0aa: e3a090180024 stg %r10,24(%r9) >>> [ 566.532104] 000003ff802fc0b0: e31090080024 stg %r1,8(%r9) >>> [ 566.532104] 000003ff802fc0b6: 58108000 l %r1,0(%r8) >>> [ 566.532143] Call Trace: >>> [ 566.532149] ([<00000000be459dd8>] 0xbe459dd8) >>> [ 566.532160] [<000003ff802bba00>] fc_bsg_dispatch+0x1d0/0x248 [scsi_transport_fc] >>> [ 566.532164] [<00000000008319a4>] bsg_queue_rq+0xe4/0x118 >>> [ 566.532169] [<000000000081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670 >>> [ 566.532174] [<000000000082050e>] blk_mq_do_dispatch_sched+0x11e/0x130 >>> [ 566.532178] [<0000000000820dfe>] blk_mq_sched_dispatch_requests+0x156/0x1a0 >>> [ 566.532183] [<0000000000817564>] __blk_mq_run_hw_queue+0x144/0x160 >>> [ 566.532188] [<0000000000817614>] __blk_mq_delay_run_hw_queue+0x94/0x218 >>> [ 566.532193] [<00000000008178b2>] blk_mq_run_hw_queue+0xda/0xf0 >>> [ 566.532197] [<00000000008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8 >>> [ 566.532202] [<0000000000811ee2>] blk_execute_rq_nowait+0x72/0x80 >>> [ 566.532207] [<0000000000811f66>] blk_execute_rq+0x76/0xb8 >>> [ 566.532211] [<0000000000830d0e>] bsg_ioctl+0x426/0x500 >>> [ 566.532215] [<0000000000440cb4>] do_vfs_ioctl+0x68c/0x710 >>> [ 566.532220] [<0000000000440dac>] ksys_ioctl+0x74/0xa0 >>> [ 566.532224] [<0000000000440e0a>] sys_ioctl+0x32/0x40 >>> [ 566.532228] [<0000000000bf1dd0>] system_call+0xd8/0x2d0 >>> [ 566.532231] INFO: lockdep is turned off. >>> [ 566.532234] Last Breaking-Event-Address: >>> [ 566.532243] [<000003ff802fc090>] zfcp_fc_exec_bsg_job+0x108/0x2c0 [zfcp] >>> [ 566.532247] >>> [ 566.532250] Kernel panic - not syncing: Fatal exception: panic_on_oops >>> >>> This is the state of your branch from an hour ago or so. >> >> The first one is an easy fix, not sure how I missed that. The other >> one I have no idea, any chance you could try with this one: >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=142dc9f36e3113b6a76d472978c33c8c2a2b702c >> >> which fixes the first one, and also corrects a wrong end_io call, >> but I don't think that's the cause of the above. >> >> If it crashes, can you figure out where in the source that is? >> Basically just do >> >> gdb vmlinux >> l *zfcp_fc_exec_bsg_job+0x116 >> >> assuming that works fine on s390 :-) >> > > Sry, I am a bit split between several things I should do right now > (nothing new, right? :) ), I'll continue on this tomorrow. I'm just happy that you're able to test, since this is something I cannot test on my own. FWIW, I did run the latest and did bidirectional commands with scsi_debug through the BSG interface, and it works for me. But the private bsg setup queues are a bit different, so... Let me know how I can help you. For the record, the old bsg interface that dropped the queue lock (and enabled IRQs) from the request_fn was buggy, that's not a valid thing to do. I'm hopeful that once we work through the kinks of this issue, we'll be better off for it in the long run. -- Jens Axboe