在 2011年11月27日 上午9:28,Huajun Li <huajun.li.lee@xxxxxxxxx> 写道: > 在 2011年11月26日 下午10:00,James Bottomley <jbottomley@xxxxxxxxxxxxx> 写道: >> On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote: >>> On Nov 24 Huajun Li wrote: >>> > While unplugging usb disk, scsi_disk(disk)->device may be released >>> > before sd_revalidate_disk() is called, then there will occur Oops as >>> > shown below: >>> [...] >>> > --- a/drivers/scsi/sd.c >>> > +++ b/drivers/scsi/sd.c >>> > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct >>> > scsi_device *sdp) >>> > static int sd_revalidate_disk(struct gendisk *disk) >>> > { >>> > struct scsi_disk *sdkp = scsi_disk(disk); >>> > - struct scsi_device *sdp = sdkp->device; >>> > + struct scsi_device *sdp; >>> > unsigned char *buffer; >>> > unsigned flush = 0; >>> > >>> > + if (sdkp) >>> > + sdp = sdkp->device; >>> > + else >>> > + goto out; >>> > + >>> > SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, >>> > "sd_revalidate_disk\n")); >>> > >>> >>> Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld] >>> stack be structured along the lines that lower-level device instances live >>> as long as higher levels rely on them? >> >> Not really, no. The problem is the lifetime rules are complex. The >> lowest level objects actually live the longest because they're first to >> be discovered before we know what higher level functions to attach. >> >> That's why you see complex paired gets in things like scsi_disk_get >> because we try to get references both to the sdkp and the underlying >> sdp ... so the sdkp can be torn down by last reference release from >> either above or below (this is the menace of hot unplug). >> >>> For about a year now or so, I am seeing patches floating by that add NULL >>> pointer checks here and there (or patches that clear pointers), and every >>> time I wonder >>> - where else such NULL pointer checks might be needed, >>> - in what way (if at all) it is ensured that a function which is made to >>> check for a valid pointer at the top does not race with pointer >>> invalidation further down the road. >> >> I agree. The patch is clearly wrong because sdkp is a refcounted object >> that never actually sets sdkp->device to NULL. If we find a NULL in >> there it must be because the sdkp object is wrong. The implication from >> the trace seems to be that something allowed blkid to open a non >> existent device. >> >> James >> >> > > Thanks for your response. > > Yes, in this case, actually sdkp is NULL rather than sdkp->device, and > the patch indicates this. > However, there is typo in my comments, maybe it misleads you, sorry! > In fact, the comment should be: > > " While unplugging usb disk, scsi_disk(disk) may be released > before sd_revalidate_disk() is called, then there will occur Oops." > BTW, after applied the patch and repeatedly plug/unplug the USB stick, I did not see this crash. However, the other concern is, need we validate scsi_disk(disk) in some other functions specified in sd_fops ? since they may be also called from other layer after scsi_disk(disk) is released. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html