On Fri, Oct 26, 2018 at 10:08:30AM -0600, Jens Axboe wrote: > On 10/26/18 10:06 AM, Benjamin Block wrote: > > On Thu, Oct 25, 2018 at 05:12:59PM -0600, Jens Axboe wrote: > >> On 10/23/18 12:07 PM, Jens Axboe wrote: > >>> On 10/23/18 11:40 AM, Benjamin Block wrote: > >>>> > >>>> So I tried 4.19.0 with only the two patches from you: > >>>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=2b2ffa16193e9a69a076595ed64429b8cc9b42aa > >>>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions&id=142dc9f36e3113b6a76d472978c33c8c2a2b702c > >>>> > >>>> This fixed the first warning from before, as you suggested, but it still > >>>> crash like this: > >>>> > >>>> [ ] Unable to handle kernel pointer dereference in virtual kernel address space > >>>> [ ] Failing address: 0000000000000000 TEID: 0000000000000483 > >>>> [ ] Fault in home space mode while using kernel ASCE. > >>>> [ ] AS:00000000025f0007 R3:00000000dffb8007 S:00000000dffbf000 P:000000000000013d > >>>> [ ] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC > >>>> [ ] Modules linked in: .... > >>>> [ ] CPU: 2 PID: 609 Comm: kworker/2:1H Kdump: loaded Tainted: G W 4.19.0-bb-next+ #1 > >>>> [ ] Hardware name: IBM 3906 M03 704 (LPAR) > >>>> [ ] Workqueue: kblockd blk_mq_run_work_fn > >>>> [ ] Krnl PSW : 0704e00180000000 000003ff806a6b40 (zfcp_fc_exec_bsg_job+0x1c0/0x440 [zfcp]) > >>>> [ ] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 > >>>> [ ] Krnl GPRS: 0000000000000000 0000000083e0f3c0 0000000000000000 0000030000000000 > >>>> [ ] 0000030000000000 000003ff806a6b3a 00000000a86b5948 00000000a86b5988 > >>>> [ ] 0000000083e0f3f0 0000000000000000 00000000a86b5938 00000000984aee80 > >>>> [ ] 00000000a86b5800 000003ff806ba950 000003ff806a6b3a 0000000098a5ed88 > >>>> [ ] Krnl Code: 000003ff806a6b30: b9040029 lgr %r2,%r9 > >>>> 000003ff806a6b34: c0e5ffff8b7e brasl %r14,3ff80698230 > >>>> #000003ff806a6b3a: e310f0a00004 lg %r1,160(%r15) > >>>> >000003ff806a6b40: e31090000024 stg %r1,0(%r9) > >>>> 000003ff806a6b46: 4120a040 la %r2,64(%r10) > >>>> 000003ff806a6b4a: c0e5ffff8ae7 brasl %r14,3ff80698118 > >>>> 000003ff806a6b50: e310a0400004 lg %r1,64(%r10) > >>>> 000003ff806a6b56: e310f0a00024 stg %r1,160(%r15) > >>>> [ ] Call Trace: > >>>> [ ] ([<000003ff806a6aa2>] zfcp_fc_exec_bsg_job+0x122/0x440 [zfcp]) > >>>> [ ] [<000003ff8061e928>] fc_bsg_dispatch+0x310/0x398 [scsi_transport_fc] > >>>> [ ] [<0000000000d2995a>] bsg_queue_rq+0x15a/0x198 > >>>> [ ] [<0000000000d03566>] blk_mq_dispatch_rq_list+0x966/0xf90 > >>>> [ ] [<0000000000d0e37a>] blk_mq_sched_dispatch_requests+0x3fa/0x410 > >>>> [ ] [<0000000000cfc230>] __blk_mq_run_hw_queue+0x218/0x248 > >>>> [ ] [<00000000001cb124>] process_one_work+0x8c4/0xe90 > >>>> [ ] [<00000000001cbe58>] worker_thread+0x768/0xbb0 > >>>> [ ] [<00000000001dc67a>] kthread+0x22a/0x248 > >>>> [ ] [<000000000137b812>] kernel_thread_starter+0x6/0xc > >>>> [ ] [<000000000137b80c>] kernel_thread_starter+0x0/0xc > >>>> [ ] INFO: lockdep is turned off. > >>>> [ ] Last Breaking-Event-Address: > >>>> [ ] [<00000000005d9b08>] __asan_store8+0x98/0xa0 > >>>> [ ] > >>>> [ ] Kernel panic - not syncing: Fatal exception: panic_on_oops > >>>> > >>>> zfcp_fc_exec_bsg_job+0x1c0 is here: > >>>> > >>>> int zfcp_fc_exec_bsg_job(struct bsg_job *job) > >>>> { > >>>> struct Scsi_Host *shost; > >>>> struct zfcp_adapter *adapter; > >>>> struct zfcp_fsf_ct_els *ct_els = job->dd_data; > >>>> struct fc_bsg_request *bsg_request = job->request; > >>>> struct fc_rport *rport = fc_bsg_to_rport(job); > >>>> > >>>> shost = rport ? rport_to_shost(rport) : fc_bsg_to_shost(job); > >>>> adapter = (struct zfcp_adapter *)shost->hostdata[0]; > >>>> > >>>> if (!(atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_OPEN)) > >>>> return -EINVAL; > >>>> > >>>> ==> ct_els->req = job->request_payload.sg_list; > >>>> ct_els->resp = job->reply_payload.sg_list; > >>>> ct_els->handler_data = job; > >>>> > >>>> switch (bsg_request->msgcode) { > >>>> case FC_BSG_RPT_ELS: > >>>> case FC_BSG_HST_ELS_NOLOGIN: > >>>> return zfcp_fc_exec_els_job(job, adapter); > >>>> case FC_BSG_RPT_CT: > >>>> case FC_BSG_HST_CT: > >>>> return zfcp_fc_exec_ct_job(job, adapter); > >>>> default: > >>>> return -EINVAL; > >>>> } > >>>> } > >>>> > >>>> I took a dump to find out how struct bsg_job looks like when it crashes > >>>> (register clobbering isn't as bad here and I verified that job->dev is valid). > >>>> > >>>> crash> print *((struct bsg_job *)0x00000000a86b5938) > >>>> $5 = { > >>>> dev = 0x9af10358, > >>>> kref = { > >>>> refcount = { > >>>> refs = { > >>>> counter = 0x1 > >>>> } > >>>> } > >>>> }, > >>>> timeout = 0x384, > >>>> request = 0x83e0f3f0, > >>>> reply = 0x9559d500, > >>>> request_len = 0x14, > >>>> reply_len = 0x0, > >>>> request_payload = { > >>>> payload_len = 0x14, > >>>> sg_cnt = 0x1, > >>>> sg_list = 0x83e0f3c0 > >>>> }, > >>>> reply_payload = { > >>>> payload_len = 0x1000, > >>>> sg_cnt = 0x1, > >>>> sg_list = 0x83e0f1e0 > >>>> }, > >>>> result = 0x0, > >>>> reply_payload_rcv_len = 0x0, > >>>> ==> dd_data = 0x0 > >>>> } > >>>> > >>>> This is where it breaks, job->dd_data is 0x0, so when it wants to write > >>>> to it, it fails. > >>>> > >>>> So We set > >>>> > >>>> struct fc_function_template zfcp_transport_functions = { > >>>> .... > >>>> .dd_bsg_size = sizeof(struct zfcp_fsf_ct_els); > >>>> .... > >>>> } > >>>> > >>>> We get to zfcp_fc_exec_bsg_job() via fc_bsg_host_dispatch(). Previously > >>>> this was initially allocated in bsg_setup_queue() with: > >>>> > >>>> q->cmd_size = sizeof(struct bsg_job) + dd_job_size; > >>>> > >>>> And then in bsg_initialize_rq() with: > >>>> > >>>> job->dd_data = job + 1; > >>>> > >>>> Sry, I haven't check the rest of your patch yet. But just to > >>>> double-check, I ran some tests against 4.18.15, and there stuff still > >>>> works, so it didn't break w/o me noticing in between (always possible > >>> > >>> Doh, yeah it's obvious now. Try this one on top. I missed it due to the > >>> similar naming, I'll kill off q->initalize_rq_fn as well. > >>> > >>> > >>> diff --git a/block/bsg-lib.c b/block/bsg-lib.c > >>> index 58c9886394d8..198cf42b74a3 100644 > >>> --- a/block/bsg-lib.c > >>> +++ b/block/bsg-lib.c > >>> @@ -293,6 +293,7 @@ static const struct blk_mq_ops bsg_mq_ops = { > >>> .queue_rq = bsg_queue_rq, > >>> .init_request = bsg_init_rq, > >>> .exit_request = bsg_exit_rq, > >>> + .initialize_rq_fn = bsg_initialize_rq, > >>> .complete = bsg_complete, > >>> }; > >>> > >>> @@ -329,7 +330,6 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name, > >>> goto out_queue; > >>> } > >>> > >>> - q->initialize_rq_fn = bsg_initialize_rq; > >>> q->queuedata = dev; > >>> q->bsg_job_fn = job_fn; > >>> blk_queue_flag_set(QUEUE_FLAG_BIDI, q); > >> > >> Did you get a chance to test with the above? > >> > > > > Yes, found time today. This works now. I extended my small test-suite to > > test both sending commands against FC-Host and FC-Rport bsg-devices. > > Both seem to work with zFCP as far as my tests go (but I only have like > > 3 commands in that "test-suite", so its not extensive). > > Excellent! Can I add your tested-by to the patch? > Sure. -- With Best Regards, Benjamin Block / Linux on IBM Z Kernel Development IBM Systems & Technology Group / IBM Deutschland Research & Development GmbH Vorsitz. AufsR.: Martina Koederitz / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294