On Mon, 2020-03-09 at 12:36 -0700, Bart Van Assche wrote: > On 3/9/20 11:14 AM, Ewan D. Milne wrote: > > Large queues of I/O to offline devices that are eventually > > submitted when devices are unblocked result in a many repeated > > "rejecting I/O to offline device" messages. These messages > > can fill up the dmesg buffer in crash dumps so no useful > > prior messages remain. In addition, if a serial console > > is used, the flood of messages can cause a hard lockup in > > the console code. > > > > Introduce a flag indicating the message has already been logged > > for the device, and reset the flag when scsi_device_set_state() > > changes the device state. > > > > Signed-off-by: Ewan D. Milne <emilne@xxxxxxxxxx> > > --- > > drivers/scsi/scsi_lib.c | 8 ++++++-- > > include/scsi/scsi_device.h | 2 ++ > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 610ee41..d3a6d97 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req) > > * commands. The device must be brought online > > * before trying any recovery commands. > > */ > > - sdev_printk(KERN_ERR, sdev, > > - "rejecting I/O to offline device\n"); > > + if (!sdev->offline_already) { > > + sdev->offline_already = 1; > > + sdev_printk(KERN_ERR, sdev, > > + "rejecting I/O to offline device\n"); > > + } > > return BLK_STS_IOERR; > > case SDEV_DEL: > > /* > > @@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) > > break; > > > > } > > + sdev->offline_already = 0; > > sdev->sdev_state = state; > > return 0; > > > > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > > index f8312a3..72987a0 100644 > > --- a/include/scsi/scsi_device.h > > +++ b/include/scsi/scsi_device.h > > @@ -204,6 +204,8 @@ struct scsi_device { > > unsigned unmap_limit_for_ws:1; /* Use the UNMAP limit for WRITE SAME */ > > unsigned rpm_autosuspend:1; /* Enable runtime autosuspend at device > > * creation time */ > > + unsigned offline_already:1; /* Device offline message logged */ > > + > > atomic_t disk_events_disable_depth; /* disable depth for disk events */ > > > > DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */ > > Bitfields are troublesome in multithreaded software. Has it been > considered to use rate-limiting instead of introducing a new bitfield > member? > > Thanks, > > Bart. > I did but printk_ratelimited() does not do what is desired here. What we want is only a single message per-device. If we ratelimit the message instance itself we lose information in the log about which devices were affected (which makes debugging issues with multipath I/O much harder). The only purpose of the flag is to try to suppress duplicate messages, in the common case it is a single thread submitting the queued I/O which is going to get rejected. If multiple threads submit I/O there might be duplicated messages but that is not all that critical. Hence the lack of locking on the flag. -Ewan