On Sun, 18 Oct 2009, Stefan Richter wrote: > > It seems to me that the restart always fails if the "rediscovered > > device fw1" resp. "firewire_sbp2: fw1.0: reconnected to LUN 0000" > > message comes after the "[sdb] Starting disk" message. That would > > sound like an actual bug to me. > > It is not a bug. IEEE 1394 rediscovery and SBP-2 reconnect can become > necessary anytime (and they do become necessary at /least/ once during > PM resume), in no particular order with respect to SCSI request > submission. Our drivers (firewire-sbp2 mainly) need to be able to > handle any order of such events. Is it possible to delay returning from the device resume routine until the rediscovery/reconnect has completed? This is more or less how the USB stack works. > Interesting findings. > > There are two independent places of the code that could possibly be > improved to fix this issue: > > a.) sd's PM resume method: > > 1.a) sd_resume could gain this retry loop which you implemented. This wouldn't be necessary if the transport was working before sd_resume got called. > 1.b) sd_resume (but probably not sd_suspend) could optimistically > ignore any error return from sd_start_stop_device. If the motor cannot > be started immediately at resume, the SCSI core would try to start it > later on when the disk is normally accessed. This is probably a worthwhile idea in any case. > My assumption here is that an error return from sd_resume causes the > disk to become inaccessible (taken offline?). No. All it does is cause an error message to be printed in the system log. But it's possible that a failure lower down in the SCSI stack has this effect. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html