Wakko Warner wrote: > Bart Van Assche wrote: > > On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote: > > > Wakko Warner wrote: > > > > Wakko Warner wrote: > > > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine. > > > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount > > > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target > > > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from > > > > > the initiator with out problems. I'll test other kernels between 4.9 and > > > > > 4.14. > > > > > > > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch > > > > (except for 4.15 which was 1 behind) > > > > Each of these kernels crash within seconds or immediate of doing find -type > > > > f | xargs cat > /dev/null from the initiator. > > > > > > I tried 4.10.0. It doesn't completely lockup the system, but the device > > > that was used hangs. So from the initiator, it's /dev/sr1 and from the > > > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the > > > process to hang in D state. > > > > Hello Wakko, > > > > Thank you for having narrowed down this further. I think that you encountered > > a regression either in the block layer core or in the SCSI core. Unfortunately > > the number of changes between kernel versions v4.9 and v4.10 in these two > > subsystems is huge. I see two possible ways forward: > > - Either that you perform a bisect to identify the patch that introduced this > > regression. However, I'm not sure whether you are familiar with the bisect > > process. > > - Or that you identify the command that triggers this crash such that others > > can reproduce this issue without needing access to your setup. > > > > How about reproducing this crash with the below patch applied on top of > > kernel v4.15.x? The additional output sent by this patch to the system log > > should allow us to reproduce this issue by submitting the same SCSI command > > with sg_raw. > > Ok, so I tried this, but scsi_print_command doesn't print anything. I added > a check for !rq and the same thing that blk_rq_nr_phys_segments does in an > if statement above this thinking it might have crashed during WARN_ON_ONCE. > It still didn't print anything. My printk shows this: > [ 36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0 > > I also had scsi_print_command in the same if block which again didn't print > anything. Is there some debug option I need to turn on to make it print? I > tried looking through the code for this and following some of the function > calls but didn't see any config options. I know now why scsi_print_command isn't doing anything. cmd->cmnd is null. I added a dev_printk in scsi_print_command where the 2 if statements return. Logs: [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL > > Subject: [PATCH] Report commands with no physical segments in the system log > > > > --- > > drivers/scsi/scsi_lib.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 6b6a6705f6e5..74a39db57d49 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd) > > bool is_mq = (rq->mq_ctx != NULL); > > int error = BLKPREP_KILL; > > > > - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) > > + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) { > > + scsi_print_command(cmd); > > goto err_exit; > > + } > > > > error = scsi_init_sgtable(rq, &cmd->sdb); > > if (error) > -- > Microsoft has beaten Volkswagen's world record. Volkswagen only created 22 > million bugs. -- Microsoft has beaten Volkswagen's world record. Volkswagen only created 22 million bugs.