Re: [PATCH 5/10] convert st to use scsi_execute_async

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 13 Nov 2005, Doug Ledford wrote:

> Kai Makisara wrote:
> > On Sat, 12 Nov 2005, Mike Christie wrote:
> 
> I noticed that these patches still have the same bug that the 2.4 kernel st
> driver has, namely the holding of the st's SCSI request struct until
> write_behind_check is called.  This behavior is responsible for at least two
> bugs with tape systems under 2.4 that we've fixed.  The first bug is that if
> you perform a write to a tape device that involves an async write behind
> request, then attempt to access the device via the sg mechanism without
> performing any intervening read or ioctl commands on the st device, the sg
> access will hang.  This only happens on SCSI controllers that set the
> cmd_per_lun value == 1 (eg. mptscsih).  In order to replicate this problem you
> need one application writing to the tape device, then pausing, then something
> as simple as attempting to do an INQUIRY to the tape while the writer is
> paused causes the hang.  This happens at least with NetBackup, possibly with
> others as well.  The second bug is related to multiple tape usage on the same
> system.  It only happens on x86_64, not i686, but with multiple tapes in use
> the system eventually attempts to dma map a null pointer resulting in a BUG().
> I didn't root cause the dma mapping issue, but I did verify that once the
> initial bug was fixed, the dma mapping bug went away as well (either because
> whatever race window existed was reduced to so small that we no longer hit it
> or the problem was in fact fixed).  The patch we used to solve the problem is
> attached.  As a side note, holding on to a command without any upper bound on
> when it will be released is simply a *bad* idea.  Get the information you need
> from the command and free it.
> 

You are complaining about one feature and reporting a possible bug without 
much information. It seems that you (RedHat) have been sitting on this 
report for a long time and have shipped the fix for your own clients only. 
Not very nice!

Originally there was a reason why the SCSI request struct was held until 
write_behind_check. The reason was to execute minimum amount of code in 
interrupt context. For a very long time, scsi_done has been called outside 
interrupt and this reason is not valid any more. The reason why this has 
not changed is that nobody has asked for it.

I don't see any reason why the change you suggest should not be done. Does 
anyone else? If nobody complains, I will do the change for 2.6.16.

The dma bug you are talking about is interesting but I don't have any idea 
why it is happening. Releasing the SCSI request earlier should not have 
anything to do with that.

Mixing sg access with ULD operation is almost always a bad idea.

Thanks for the report and fix.

-- 
Kai
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux