Re: [PATCH 1/7] print eh activation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 03 December 2008 16:16:30 James Bottomley wrote:
> On Wed, 2008-12-03 at 12:19 +0100, Bernd Schubert wrote:
> > On Wednesday 26 November 2008 19:47:02 James Bottomley wrote:
> > > On Wed, 2008-11-26 at 18:44 +0100, Bernd Schubert wrote:
> > > > Print activation of the scsi error handler to let the user know what
> > > > was the the error handler was activated. These information are
> > > > essential to diagnose hardware issues.
> > >
> > > But it can be turned on already with SCSI logging ... at least the
> > > activation message.  I don't think we want this to be printed all the
> > > time, because the error handler can be activated in non-error
> > > situations for some HBAs (like sense collection for non-ACA emulating
> > > drivers).
> >
> > Sorry for the late reply, I didn't have access to my mails for a few
> > days.
> >
> > Actually I entirely disagree, activating the error handler should be an
> > exception and as such exception, it shall print it was activated and also
> > the reason why it was activated. Without these information we see quite
> > often in our logs something like:
> >
> > [12165690.357905] mptscsih: ioc1: attempting task abort!
> > (sc=ffff81012a957500) [12165690.357966] sd 3:0:1:0:
> > [12165690.358018]         command: cdb[0]=0x28: 28 00 37 10 e9 4f 00 00
> > 08 00 [12165690.732712] mptbase: ioc1: IOCStatus(0x0048): SCSI Task
> > Terminated [12165690.733699] mptscsih: ioc1: task abort: SUCCESS
> > (sc=ffff81012a957500)
> >
> > But this gives you no chance to see, where it comes from. After adding
> > the additional printks from my patch, we recognized the error handler was
> > activated mostly due to command timeouts. So increasing the timeouts to
> > >90s already solved 2/3rds of our problems. Please also see patch nr. 6,
> > the additional printks did help me to recognize always only one special
> > scsi command fails.
>
> But surely what you're arguing for then, is a printk on command timeout?

I'm arguing that calling the error handler is a rare exception and that the 
admin wants to know what has caused this exception. This is also nothing
you want to enable with scsi logging, since it mostly errors happen after 
weeks when the system is already in production and nobody then has error
logging active.

Another example for the timeouts patch:

sd 6:0:2:2: [sdk] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
sd 6:0:2:2: [sdk] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
sd 6:0:2:2: Activating scsi error recovery (1)
sd 6:0:2:2: trying to abort command
qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20


So without the printk patch you would see many messages like these:

qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20
qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20
qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20
qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20
qla2xxx 0000:07:02.0: scsi(6:2:2): Abort command issued -- 1 36e2df2 20

But you wouldn't have an idea why and which command was aborted. Eventually
this will cause a severe failure of qla2xxx driver, but you never would
figure out the underlying reason. Actually, I wouldn't mind to suppress these
driver messages, but the eh activation printks are essential to understand
what is going.

>
> > In my opinion, if a driver needs the error handler for specific actions,
> > we should create another interface for that. Could you please point me to
> > such a non-ACA river?
> > I also only see two calling functions of scsi_eh_scmd_add(), namely
> > scsi_times_out() and scsi_softirq_done() and only for these calls the
> > additinal printks will be done (since scmd is required to do the
> > printks).
>
> Mostly we converted the in-use drivers, but things like the parallel
> port drivers still use this mechanism.

Thanks, going to check these now.


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux