On 03/02/2012 05:01 PM, scameron@xxxxxxxxxxxxxxxxxx wrote: > On Fri, Mar 02, 2012 at 03:10:02PM -0600, Mike Christie wrote: >> On 03/02/2012 09:44 AM, scameron@xxxxxxxxxxxxxxxxxx wrote: >>> >>> What should the LLD do if an abort request comes into the >>> abort error handler from the midlayer for a command which is >>> not known to the LLD? >>> >>> I see aic7xxx_osm.c handles it in this way in ahc_linux_queue_recovery_cmd(): >>> >>> no_cmd: >>> /* >>> * Our assumption is that if we don't have the command, no >>> * recovery action was required, so we return success. Again, >>> * the semantics of the mid-layer recovery engine are not >>> * well defined, so this may change in time. >>> */ >>> retval = SUCCESS; >>> >>> Is that the right thing to do? Seems a bit weird, but if that's >>> the right thing to do, I can do that too. >>> >> >> How do you hit this case? > > I'm not quite sure. I haven't hit it, but have a report of it on RHEL5u5 > with XFS filesystem under heavy load. As a guess, I'd say a race between > driver completing the command and a timeout in the mid layer. In any > case, it'd be nice to know what the kernel expects a driver to do if > it should encounter that situation. > >> >> I think it is ok. The reasons I have seen drivers hit it this is that >> race where the driver is completing a command while the timer code is >> starting to go off, or the cmd has timed out then the driver completes >> the command before the abort code is run. >> >> In those cases the driver has cleaned up its internal accounting because >> the command has completed. At that point there is not much it can do >> even if it wanted to. It does not have away to look up things like >> internal tags/ids for the command. > > Right, but it just seems weird for the driver to effectively say, "Sure, > I aborted that command", when it did no such thing. If the driver tells > the kernel that a write got aborted when really it was completed, that > seems like it could be kind of bad. I see what you are saying. Yeah, it would be better if we had a new error code for this, so we could return it from the abort handler. Then scsi_error.c could also skip retrying it. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html