> >On Tue, 2005-09-27 at 13:10 -0400, Bagalkote, Sreenivas wrote: >> What do you mean by "actually do a reset"? I see that >firmware doesn't >> have any pending commands. So I simply return success from >reset routine. >> Do you see any problem in this? After a hundred or so such >cycles, the >> system is frozen. I should also tell you that if I introduce abort >> handler and return success for all the completed commands, I >don't see the OS hang. > >Well, yes, for two reasons > >1. you do clustering, so a reset request could be from a >reservation breaking protocol > I don't have clustering setup. So this is definitely not the reason. >2. The fact that the eh activated indicates something went >wrong. If you take no corrective action and the test unit >ready that follows the reset fails or times out then the >device will be taken offline. > Heavy IOs are going on in the FW while it is rebuilding RAID arrays. We expect some of the commands to timeout. But the key is recover gracefully. I see that FW is completing _all_ the commands albeit after timing out. When the reset handler is called after all the commands are out of the door, I simply return success. Can this potentially cause any issues? Thanks for your quick responses. Sreenivas - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html