Re: oops during scsi scanning disk setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2009-08-21 at 15:51 +0100, Chris Webb wrote:
> James Bottomley <James.Bottomley@xxxxxxx> writes:
> 
> > On Fri, 2009-08-21 at 10:23 +0100, Chris Webb wrote:
> >
> > > Sorry to follow up a third time, but I can now confirm this. I slipped -g into
> > > CFLAGS in the kernel Makefile and rebuilt genhd.o and then the entire vmlinux.
> > 
> > I suppose it makes sense:  That was the only dereference at offset 16 I
> > could find in the code.  The thing which doesn't quite make sense is
> > that disk_part_iter_init() also dereferences the same pointer
> > successfully ... I suppose this could be a race with another thread to
> > null out the gendisk part_tbl ... I'll have to think about it some more.
> 
> Thanks! If it helps, I've only ever seen it following an iscsi login to a
> target machine which is heavy loaded (e.g. RAID resync in this case),
> presumably meaning that everything (including disk reads) happens a bit
> slowly. Perhaps this increases the window for a race in some way?
> 
> I've spent some time over the past week trying to reproduce it in a VM with
> magic sysrq enabled so I could find out a bit more, but it subbornly refuses
> to happen except on machines in a busy production cluster.

Actually, for that particular pointer to be NULL'd, I think the race
must be between add_disk and del_gendisk, implying that your iSCSI
cluster somehow shut down the link while it was busy.

I think that's an artifact of the fact that we don't get a reference to
the disk in these operations, and the race window is much longer now we
do sd async scanning.

Can you try this as a partial fix?  (It should prevent the oops, but
you'll still lose the disk).

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b7b9fec..a89c421 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2021,6 +2021,7 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 
 	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
 		  sdp->removable ? "removable " : "");
+	put_device(&sdkp->dev);
 }
 
 /**
@@ -2106,6 +2107,7 @@ static int sd_probe(struct device *dev)
 
 	get_device(&sdp->sdev_gendev);
 
+	get_device(&sdkp->dev);	/* prevent release before async_schedule */
 	async_schedule(sd_probe_async, sdkp);
 
 	return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux