Re: [PATCH] scsi: avoid repetitive logging of device offline messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2020-03-09 at 12:36 -0700, Bart Van Assche wrote:
> On 3/9/20 11:14 AM, Ewan D. Milne wrote:
> > Large queues of I/O to offline devices that are eventually
> > submitted when devices are unblocked result in a many repeated
> > "rejecting I/O to offline device" messages.  These messages
> > can fill up the dmesg buffer in crash dumps so no useful
> > prior messages remain.  In addition, if a serial console
> > is used, the flood of messages can cause a hard lockup in
> > the console code.
> > 
> > Introduce a flag indicating the message has already been logged
> > for the device, and reset the flag when scsi_device_set_state()
> > changes the device state.
> > 
> > Signed-off-by: Ewan D. Milne <emilne@xxxxxxxxxx>
> > ---
> >   drivers/scsi/scsi_lib.c    | 8 ++++++--
> >   include/scsi/scsi_device.h | 2 ++
> >   2 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 610ee41..d3a6d97 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1240,8 +1240,11 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
> >   		 * commands.  The device must be brought online
> >   		 * before trying any recovery commands.
> >   		 */
> > -		sdev_printk(KERN_ERR, sdev,
> > -			    "rejecting I/O to offline device\n");
> > +		if (!sdev->offline_already) {
> > +			sdev->offline_already = 1;
> > +			sdev_printk(KERN_ERR, sdev,
> > +				    "rejecting I/O to offline device\n");
> > +		}
> >   		return BLK_STS_IOERR;
> >   	case SDEV_DEL:
> >   		/*
> > @@ -2340,6 +2343,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
> >   		break;
> >   
> >   	}
> > +	sdev->offline_already = 0;
> >   	sdev->sdev_state = state;
> >   	return 0;
> >   
> > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> > index f8312a3..72987a0 100644
> > --- a/include/scsi/scsi_device.h
> > +++ b/include/scsi/scsi_device.h
> > @@ -204,6 +204,8 @@ struct scsi_device {
> >   	unsigned unmap_limit_for_ws:1;	/* Use the UNMAP limit for WRITE SAME */
> >   	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend at device
> >   					 * creation time */
> > +	unsigned offline_already:1;	/* Device offline message logged */
> > +
> >   	atomic_t disk_events_disable_depth; /* disable depth for disk events */
> >   
> >   	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
> 
> Bitfields are troublesome in multithreaded software. Has it been 
> considered to use rate-limiting instead of introducing a new bitfield 
> member?
> 
> Thanks,
> 
> Bart.
> 

I did but printk_ratelimited() does not do what is desired here.
What we want is only a single message per-device.  If we ratelimit
the message instance itself we lose information in the log about which
devices were affected (which makes debugging issues with multipath I/O
much harder).

The only purpose of the flag is to try to suppress duplicate messages,
in the common case it is a single thread submitting the queued I/O which
is going to get rejected.  If multiple threads submit I/O there might
be duplicated messages but that is not all that critical.  Hence the
lack of locking on the flag.

-Ewan




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux