On Wed, 09 Jan 2008 17:51:43 -0600 James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c > > --- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100 > > +++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-09 22:22:30.000000000 +0100 > > @@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char * > > */ > > #define WAIT_FOR_PCI_RECOVERY 35 > > if (pci_channel_offline(pdev)) { > > - struct completion *io_reset; > > int finished_reset = 0; > > init_completion(&eh_done); > > spin_lock_irq(shost->host_lock); > > /* Make sure we didn't race */ > > if (pci_channel_offline(pdev)) { > > - if (!sym_data->io_reset) > > - sym_data->io_reset = &eh_done; > > - io_reset = sym_data->io_reset; > > + BUG_ON(!sym_data->io_reset); > > + sym_data->io_reset = &eh_done; > > } else { > > finished_reset = 1; > > } > > spin_unlock_irq(shost->host_lock); > > if (!finished_reset) > > - finished_reset = wait_for_completion_timeout(io_reset, > > + finished_reset = wait_for_completion_timeout > > + (sym_data->io_reset, > > WAIT_FOR_PCI_RECOVERY*HZ); > > + sym_data->io_reset = NULL; > > This has to be cleared under the host_lock to forestall the (tiny) race > where the pci recovery code checks the value of sym_data->io_reset, we > change it to null and then the pci recovery code completes a NULL > pointer. > It is impossible as the io_reset value is not NULL before and during wait completion. The case above would happen only if one thread checked the io_reset value (under lock) and it was NULL and before setting it (inside locked section) another thread checked the io_reset value (still NULL and also inside locked section = impossible). Otherwise, the BUG_ON() kicked in (the value is already not NULL). Another case is if you consider changing the io_reset value after the locked section but before wait_for_completion_timeout(). In this case, putting spinlock around the io_reset clearing does not change anything. There is still a chance of race if the io_reset is cleared after one thread leaves the locked section then another one clearing the io_reset (under lock) then completion by the first thread happens with NULL pointer. Am I right? I understand that you asked for change like this: + spin_lock_irq(shost->host_lock); + sym_data->io_reset = NULL; + spin_unlock_irq(shost->host_lock); Maybe, better solution is to clear the io_reset field inside the sym2_io_resume() (as it was done)? It would be cleared only after the completion and before it is cleared the BUG_ON() guards against race. The 2nd race case described above is impossible in such solution. However, another version of the patch is needed as the BUG_ON condition should be BUG_ON(sym_data->io_reset) and not BUG_ON(!sym_data->io_reset). So, I wait for your opinion to the lock issue. If you still consider version with the change presented above better, I'll add it. Kind regards, Krzysztof ---------------------------------------------------------------------- Rozdajemy nagrody! Sprawdz >> http://link.interia.pl/f1cbf - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html