Re: [PATCH] bsg: convert to use blk-mq

Jens Axboe <axboe@xxxxxxxxx> · Thu, 25 Oct 2018 17:12:59 -0600

On 10/23/18 12:07 PM, Jens Axboe wrote:
> On 10/23/18 11:40 AM, Benjamin Block wrote:
>> On Mon, Oct 22, 2018 at 06:38:36AM -0600, Jens Axboe wrote:
>>> On 10/22/18 4:03 AM, Benjamin Block wrote:
>>>> On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote:
>>>>
>>>> Ok so, that gets past the stage where we initialize the queues. Simple
>>>> SCSI-I/O also seems to work, that is for example an INQUIRY(10), but
>>>> transport commands that get passed to the driver break. Tried to send
>>>> a FibreChannel GPN_FT (remote port discovery).
>>>>
>>>> As the BSG interface goes. This is a bidirectional command, that has
>>>> both a buffer for the request and for the reply. AFAIR BSG will create a
>>>> struct request for each of them. Protocol is BSG_PROTOCOL_SCSI,
>>>> Subprotocol BSG_SUB_PROTOCOL_SCSI_TRANSPORT. The rest should be
>>>> transparent till we get into the driver.
>>>>
>>>> First got this:
>>>>
>>>> [  566.531100] BUG: sleeping function called from invalid context at mm/slab.h:421
>>>> [  566.531452] in_atomic(): 1, irqs_disabled(): 0, pid: 3104, name: bsg_api_test
>>>> [  566.531460] 1 lock held by bsg_api_test/3104:
>>>> [  566.531466]  #0: 00000000cb4b58e8 (rcu_read_lock){....}, at: hctx_lock+0x30/0x118
>>>> [  566.531498] Preemption disabled at:
>>>> [  566.531503] [<00000000008175d0>] __blk_mq_delay_run_hw_queue+0x50/0x218
>>>> [  566.531519] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: G        W         4.19.0-rc6-bb-next+ #1
>>>> [  566.531527] Hardware name: IBM 3906 M03 704 (LPAR)
>>>> [  566.531533] Call Trace:
>>>> [  566.531544] ([<00000000001167fa>] show_stack+0x8a/0xd8)
>>>> [  566.531555]  [<0000000000bcc6d2>] dump_stack+0x9a/0xd8
>>>> [  566.531565]  [<0000000000196410>] ___might_sleep+0x280/0x298
>>>> [  566.531576]  [<00000000003e528c>] __kmalloc+0xbc/0x560
>>>> [  566.531584]  [<000000000083186a>] bsg_map_buffer+0x5a/0xb0
>>>> [  566.531591]  [<0000000000831948>] bsg_queue_rq+0x88/0x118
>>>> [  566.531599]  [<000000000081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670
>>>> [  566.531607]  [<000000000082050e>] blk_mq_do_dispatch_sched+0x11e/0x130
>>>> [  566.531615]  [<0000000000820dfe>] blk_mq_sched_dispatch_requests+0x156/0x1a0
>>>> [  566.531622]  [<0000000000817564>] __blk_mq_run_hw_queue+0x144/0x160
>>>> [  566.531630]  [<0000000000817614>] __blk_mq_delay_run_hw_queue+0x94/0x218
>>>> [  566.531638]  [<00000000008178b2>] blk_mq_run_hw_queue+0xda/0xf0
>>>> [  566.531645]  [<00000000008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8
>>>> [  566.531653]  [<0000000000811ee2>] blk_execute_rq_nowait+0x72/0x80
>>>> [  566.531660]  [<0000000000811f66>] blk_execute_rq+0x76/0xb8
>>>> [  566.531778]  [<0000000000830d0e>] bsg_ioctl+0x426/0x500
>>>> [  566.531787]  [<0000000000440cb4>] do_vfs_ioctl+0x68c/0x710
>>>> [  566.531794]  [<0000000000440dac>] ksys_ioctl+0x74/0xa0
>>>> [  566.531801]  [<0000000000440e0a>] sys_ioctl+0x32/0x40
>>>> [  566.531808]  [<0000000000bf1dd0>] system_call+0xd8/0x2d0
>>>> [  566.531815] 1 lock held by bsg_api_test/3104:
>>>> [  566.531821]  #0: 00000000cb4b58e8 (rcu_read_lock){....}, at: hctx_lock+0x30/0x118
>>>>
>>>
>>> The first one is an easy fix, not sure how I missed that. The other
>>> one I have no idea, any chance you could try with this one:
>>>
>>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=142dc9f36e3113b6a76d472978c33c8c2a2b702c
>>>
>>> which fixes the first one, and also corrects a wrong end_io call,
>>> but I don't think that's the cause of the above.
>>>
>>> If it crashes, can you figure out where in the source that is?
>>> Basically just do
>>>
>>> gdb vmlinux
>>> l *zfcp_fc_exec_bsg_job+0x116
>>>
>>> assuming that works fine on s390 :-)
>>>
>>
>> So I tried 4.19.0 with only the two patches from you:
>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=2b2ffa16193e9a69a076595ed64429b8cc9b42aa
>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=142dc9f36e3113b6a76d472978c33c8c2a2b702c
>>
>> This fixed the first warning from before, as you suggested, but it still
>> crash like this:
>>
>> [ ] Unable to handle kernel pointer dereference in virtual kernel address space
>> [ ] Failing address: 0000000000000000 TEID: 0000000000000483
>> [ ] Fault in home space mode while using kernel ASCE.
>> [ ] AS:00000000025f0007 R3:00000000dffb8007 S:00000000dffbf000 P:000000000000013d
>> [ ] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> [ ] Modules linked in: ....
>> [ ] CPU: 2 PID: 609 Comm: kworker/2:1H Kdump: loaded Tainted: G        W         4.19.0-bb-next+ #1
>> [ ] Hardware name: IBM 3906 M03 704 (LPAR)
>> [ ] Workqueue: kblockd blk_mq_run_work_fn
>> [ ] Krnl PSW : 0704e00180000000 000003ff806a6b40 (zfcp_fc_exec_bsg_job+0x1c0/0x440 [zfcp])
>> [ ]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>> [ ] Krnl GPRS: 0000000000000000 0000000083e0f3c0 0000000000000000 0000030000000000
>> [ ]            0000030000000000 000003ff806a6b3a 00000000a86b5948 00000000a86b5988
>> [ ]            0000000083e0f3f0 0000000000000000 00000000a86b5938 00000000984aee80
>> [ ]            00000000a86b5800 000003ff806ba950 000003ff806a6b3a 0000000098a5ed88
>> [ ] Krnl Code: 000003ff806a6b30: b9040029            lgr     %r2,%r9
>>                000003ff806a6b34: c0e5ffff8b7e        brasl   %r14,3ff80698230
>>               #000003ff806a6b3a: e310f0a00004        lg      %r1,160(%r15)
>>               >000003ff806a6b40: e31090000024        stg     %r1,0(%r9)
>>                000003ff806a6b46: 4120a040            la      %r2,64(%r10)
>>                000003ff806a6b4a: c0e5ffff8ae7        brasl   %r14,3ff80698118
>>                000003ff806a6b50: e310a0400004        lg      %r1,64(%r10)
>>                000003ff806a6b56: e310f0a00024        stg     %r1,160(%r15)
>> [ ] Call Trace:
>> [ ] ([<000003ff806a6aa2>] zfcp_fc_exec_bsg_job+0x122/0x440 [zfcp])
>> [ ]  [<000003ff8061e928>] fc_bsg_dispatch+0x310/0x398 [scsi_transport_fc]
>> [ ]  [<0000000000d2995a>] bsg_queue_rq+0x15a/0x198
>> [ ]  [<0000000000d03566>] blk_mq_dispatch_rq_list+0x966/0xf90
>> [ ]  [<0000000000d0e37a>] blk_mq_sched_dispatch_requests+0x3fa/0x410
>> [ ]  [<0000000000cfc230>] __blk_mq_run_hw_queue+0x218/0x248
>> [ ]  [<00000000001cb124>] process_one_work+0x8c4/0xe90
>> [ ]  [<00000000001cbe58>] worker_thread+0x768/0xbb0
>> [ ]  [<00000000001dc67a>] kthread+0x22a/0x248
>> [ ]  [<000000000137b812>] kernel_thread_starter+0x6/0xc
>> [ ]  [<000000000137b80c>] kernel_thread_starter+0x0/0xc
>> [ ] INFO: lockdep is turned off.
>> [ ] Last Breaking-Event-Address:
>> [ ]  [<00000000005d9b08>] __asan_store8+0x98/0xa0
>> [ ]
>> [ ] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> zfcp_fc_exec_bsg_job+0x1c0 is here:
>>
>>     int zfcp_fc_exec_bsg_job(struct bsg_job *job)
>>     {
>>             struct Scsi_Host *shost;
>>             struct zfcp_adapter *adapter;
>>             struct zfcp_fsf_ct_els *ct_els = job->dd_data;
>>             struct fc_bsg_request *bsg_request = job->request;
>>             struct fc_rport *rport = fc_bsg_to_rport(job);
>>
>>             shost = rport ? rport_to_shost(rport) : fc_bsg_to_shost(job);
>>             adapter = (struct zfcp_adapter *)shost->hostdata[0];
>>
>>             if (!(atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_OPEN))
>>                     return -EINVAL;
>>
>> ==>         ct_els->req = job->request_payload.sg_list;
>>             ct_els->resp = job->reply_payload.sg_list;
>>             ct_els->handler_data = job;
>>
>>             switch (bsg_request->msgcode) {
>>             case FC_BSG_RPT_ELS:
>>             case FC_BSG_HST_ELS_NOLOGIN:
>>                     return zfcp_fc_exec_els_job(job, adapter);
>>             case FC_BSG_RPT_CT:
>>             case FC_BSG_HST_CT:
>>                     return zfcp_fc_exec_ct_job(job, adapter);
>>             default:
>>                     return -EINVAL;
>>             }
>>     }
>>
>> I took a dump to find out how struct bsg_job looks like when it crashes
>> (register clobbering isn't as bad here and I verified that job->dev is valid).
>>
>>     crash> print *((struct bsg_job *)0x00000000a86b5938)
>>     $5 = {
>>       dev = 0x9af10358,
>>       kref = {
>>         refcount = {
>>           refs = {
>>             counter = 0x1
>>           }
>>         }
>>       },
>>       timeout = 0x384,
>>       request = 0x83e0f3f0,
>>       reply = 0x9559d500,
>>       request_len = 0x14,
>>       reply_len = 0x0,
>>       request_payload = {
>>         payload_len = 0x14,
>>         sg_cnt = 0x1,
>>         sg_list = 0x83e0f3c0
>>       },
>>       reply_payload = {
>>         payload_len = 0x1000,
>>         sg_cnt = 0x1,
>>         sg_list = 0x83e0f1e0
>>       },
>>       result = 0x0,
>>       reply_payload_rcv_len = 0x0,
>> ==>   dd_data = 0x0
>>     }
>>
>> This is where it breaks, job->dd_data is 0x0, so when it wants to write
>> to it, it fails.
>>
>> So We set
>>
>>     struct fc_function_template zfcp_transport_functions = {
>>             ....
>>             .dd_bsg_size = sizeof(struct zfcp_fsf_ct_els);
>>             ....
>>     }
>>
>> We get to zfcp_fc_exec_bsg_job() via fc_bsg_host_dispatch(). Previously
>> this was initially allocated in bsg_setup_queue() with:
>>
>>             q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
>>
>> And then in bsg_initialize_rq() with:
>>
>>             job->dd_data = job + 1;
>>
>> Sry, I haven't check the rest of your patch yet. But just to
>> double-check, I ran some tests against 4.18.15, and there stuff still
>> works, so it didn't break w/o me noticing in between (always possible
> 
> Doh, yeah it's obvious now. Try this one on top. I missed it due to the
> similar naming, I'll kill off q->initalize_rq_fn as well.
> 
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 58c9886394d8..198cf42b74a3 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -293,6 +293,7 @@ static const struct blk_mq_ops bsg_mq_ops = {
>  	.queue_rq	= bsg_queue_rq,
>  	.init_request	= bsg_init_rq,
>  	.exit_request	= bsg_exit_rq,
> +	.initialize_rq_fn = bsg_initialize_rq,
>  	.complete	= bsg_complete,
>  };
>  
> @@ -329,7 +330,6 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
>  		goto out_queue;
>  	}
>  
> -	q->initialize_rq_fn = bsg_initialize_rq;
>  	q->queuedata = dev;
>  	q->bsg_job_fn = job_fn;
>  	blk_queue_flag_set(QUEUE_FLAG_BIDI, q);

Did you get a chance to test with the above?

-- 
Jens Axboe