Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2008-01-10 at 23:31 +0100, Krzysztof Helt wrote:
> On Wed, 09 Jan 2008 17:51:43 -0600
> James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> > > diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
> > > --- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c	2007-12-23 20:39:44.000000000 +0100
> > > +++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c	2008-01-09 22:22:30.000000000 +0100
> > > @@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char *
> > >  	 */
> > >  #define WAIT_FOR_PCI_RECOVERY	35
> > >  	if (pci_channel_offline(pdev)) {
> > > -		struct completion *io_reset;
> > >  		int finished_reset = 0;
> > >  		init_completion(&eh_done);
> > >  		spin_lock_irq(shost->host_lock);
> > >  		/* Make sure we didn't race */
> > >  		if (pci_channel_offline(pdev)) {
> > > -			if (!sym_data->io_reset)
> > > -				sym_data->io_reset = &eh_done;
> > > -			io_reset = sym_data->io_reset;
> > > +			BUG_ON(!sym_data->io_reset);
> > > +			sym_data->io_reset = &eh_done;
> > >  		} else {
> > >  			finished_reset = 1;
> > >  		}
> > >  		spin_unlock_irq(shost->host_lock);
> > >  		if (!finished_reset)
> > > -			finished_reset = wait_for_completion_timeout(io_reset,
> > > +			finished_reset = wait_for_completion_timeout
> > > +						(sym_data->io_reset,
> > >  						WAIT_FOR_PCI_RECOVERY*HZ);
> > > +		sym_data->io_reset = NULL;
> > 
> > This has to be cleared under the host_lock to forestall the (tiny) race
> > where the pci recovery code checks the value of sym_data->io_reset, we
> > change it to null and then the pci recovery code completes a NULL
> > pointer.
> > 
> 
> It is impossible as the io_reset value is not NULL before and during wait
> completion. The case above would happen only if one thread checked the
> io_reset value (under lock) and it was NULL and before setting it (inside
> locked section) another thread checked the io_reset value (still NULL 
> and also inside locked section = impossible). Otherwise, the BUG_ON() 
> kicked in (the value is already not NULL).

Er, no.  Let me be clearer about the sequence

sym2_io_resume() is racing to complete sym_data->io_reset with
sym_eh_handler()s timeout.  Because you don't set it to NULL under the
host_lock, you can get the sequence

sym2_io_resume() acquires the host lock and checks the value of
sym_data->io_reset and finds it to be not NULL.

sym_eh_handler() NULLs sym_data->io_resume *without* acquiring the host
lock. probably because the wait__for_completion_timeout() times out.

sym2_io_resume() calls complete() on sym_data->io_resume which is now a
NULL pointer and boom.

We never get multiple threads into sym_eh_handler() because it's single
threaded from the error handler thread.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux