Hello, James. On Mon, Apr 18, 2005 at 10:33:21AM -0500, James Bottomley wrote: > On Mon, 2005-04-11 at 03:45 +0900, Tejun Heo wrote: > > scmd->eh_timeout is used to resolve the race between command > > completion and timeout. However, during error handling, > > scsi_send_eh_cmnd uses scmd->eh_timeout. This creates a race > > condition between eh and normal completion for a request which > > has timed out and in the process of error handling. If the > > request completes while scmd->eh_timeout is being used by eh, > > eh timeout is lost and the command will be handled by both eh > > and completion path. This patch fixes the race by making > > scsi_send_eh_cmnd() use its own timer. > > > > Signed-off-by: Tejun Heo <htejun@xxxxxxxxx> > > The logic is wrong in there. > > The problem is you cannot rely on the timer being pending as a signal > that the command completed normally. The kernel doesn't define the > elapsed time between the eh_action semaphore going up and the process > waiting for it being scheduled. If the timer fires within that > undefined interval, you'll think the command timed out when it, in fact, > completed normally. The original code also uses timer pending status as a signal that command completed normally in scsi_eh_done() function, and the same race also exists in the original code, no matter what we do, unless we make timer expiration and removal of the command atomic, there will be a window in which command completes normally but considered to have timed out as long as we use timer pending status as tie breaker. The patch moves the test out of scsi_eh_done() into scsi_send_eh_cmnd() and this does widen the window by delaying removal of timer until after the original thread gets scheduled, but usually not by much and that's how timers are done in many cases (through out the kernel, timer removals are done with intervening scheduling and no one considers those incorrect). So... * If you're worried about the race itself, it was there before, it shouldn't cause any problem, and we really can't help it. * If you're worried about the widening of the window, practically, it wouldn't cause problem, and it's how timers are done in many other places. But if you still don't like it, I can rework it (maybe I'll need to add a field to Scsi_Host or scsi_cmnd). So, please let me know. Thanks a lot. -- tejun - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html