James Bottomley wrote: > On Tue, 2005-04-19 at 07:31 +0900, Tejun Heo wrote: > >> The original code also uses timer pending status as a signal that >>command completed normally in scsi_eh_done() function, and the same >>race also exists in the original code, no matter what we do, unless we >>make timer expiration and removal of the command atomic, there will be >>a window in which command completes normally but considered to have >>timed out as long as we use timer pending status as tie breaker. > > > True enough; it's a race between the driver calling scsi_done() and the > timer expiring. However, that's an acceptable race, since the timer > values are usually in the order of a few seconds and the command usually > completes in milliseconds. the done function is called in interrupt > context after command completion, so it's as close as possible to the > actual command completion > > >> The patch moves the test out of scsi_eh_done() into >>scsi_send_eh_cmnd() and this does widen the window by delaying removal >>of timer until after the original thread gets scheduled, but usually >>not by much and that's how timers are done in many cases (through out >>the kernel, timer removals are done with intervening scheduling and no >>one considers those incorrect). So... > > > The time between a thread being marked ready to run and actually running > has been measured in seconds on a heavily loaded system. That makes the > race window potentially as wide as the timer, which is unacceptable. Hmm, okay, agreed. I'll rework it. I think I can do it by just adding a private struct scsi_eh_timer_arg to pass along the timer and the cmd pointer. I'll repost the patchset soon. Thanks a lot. -- tejun - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html