在 2011年11月26日 下午10:00,James Bottomley <jbottomley@xxxxxxxxxxxxx> 写道: > On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote: >> On Nov 24 Huajun Li wrote: >> > While unplugging usb disk, scsi_disk(disk)->device may be released >> > before sd_revalidate_disk() is called, then there will occur Oops as >> > shown below: >> [...] >> > --- a/drivers/scsi/sd.c >> > +++ b/drivers/scsi/sd.c >> > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct >> > scsi_device *sdp) >> > static int sd_revalidate_disk(struct gendisk *disk) >> > { >> > struct scsi_disk *sdkp = scsi_disk(disk); >> > - struct scsi_device *sdp = sdkp->device; >> > + struct scsi_device *sdp; >> > unsigned char *buffer; >> > unsigned flush = 0; >> > >> > + if (sdkp) >> > + sdp = sdkp->device; >> > + else >> > + goto out; >> > + >> > SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, >> > "sd_revalidate_disk\n")); >> > >> >> Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld] >> stack be structured along the lines that lower-level device instances live >> as long as higher levels rely on them? > > Not really, no. The problem is the lifetime rules are complex. The > lowest level objects actually live the longest because they're first to > be discovered before we know what higher level functions to attach. > > That's why you see complex paired gets in things like scsi_disk_get > because we try to get references both to the sdkp and the underlying > sdp ... so the sdkp can be torn down by last reference release from > either above or below (this is the menace of hot unplug). > >> For about a year now or so, I am seeing patches floating by that add NULL >> pointer checks here and there (or patches that clear pointers), and every >> time I wonder >> - where else such NULL pointer checks might be needed, >> - in what way (if at all) it is ensured that a function which is made to >> check for a valid pointer at the top does not race with pointer >> invalidation further down the road. > > I agree. The patch is clearly wrong because sdkp is a refcounted object > that never actually sets sdkp->device to NULL. If we find a NULL in > there it must be because the sdkp object is wrong. The implication from > the trace seems to be that something allowed blkid to open a non > existent device. > > James > > Thanks for your response. Yes, in this case, actually sdkp is NULL rather than sdkp->device, and the patch indicates this. However, there is typo in my comments, maybe it misleads you, sorry! In fact, the comment should be: " While unplugging usb disk, scsi_disk(disk) may be released before sd_revalidate_disk() is called, then there will occur Oops." -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html