On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote: > On Nov 24 Huajun Li wrote: > > While unplugging usb disk, scsi_disk(disk)->device may be released > > before sd_revalidate_disk() is called, then there will occur Oops as > > shown below: > [...] > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct > > scsi_device *sdp) > > static int sd_revalidate_disk(struct gendisk *disk) > > { > > struct scsi_disk *sdkp = scsi_disk(disk); > > - struct scsi_device *sdp = sdkp->device; > > + struct scsi_device *sdp; > > unsigned char *buffer; > > unsigned flush = 0; > > > > + if (sdkp) > > + sdp = sdkp->device; > > + else > > + goto out; > > + > > SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, > > "sd_revalidate_disk\n")); > > > > Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld] > stack be structured along the lines that lower-level device instances live > as long as higher levels rely on them? Not really, no. The problem is the lifetime rules are complex. The lowest level objects actually live the longest because they're first to be discovered before we know what higher level functions to attach. That's why you see complex paired gets in things like scsi_disk_get because we try to get references both to the sdkp and the underlying sdp ... so the sdkp can be torn down by last reference release from either above or below (this is the menace of hot unplug). > For about a year now or so, I am seeing patches floating by that add NULL > pointer checks here and there (or patches that clear pointers), and every > time I wonder > - where else such NULL pointer checks might be needed, > - in what way (if at all) it is ensured that a function which is made to > check for a valid pointer at the top does not race with pointer > invalidation further down the road. I agree. The patch is clearly wrong because sdkp is a refcounted object that never actually sets sdkp->device to NULL. If we find a NULL in there it must be because the sdkp object is wrong. The implication from the trace seems to be that something allowed blkid to open a non existent device. James ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f