Re: [PATCH] SCSI/sd: Fix NULL dereference in sd_revalidate_disk

Huajun Li <huajun.li.lee@xxxxxxxxx> · Sun, 27 Nov 2011 10:20:07 +0800

在 2011年11月27日 上午9:28，Huajun Li <huajun.li.lee@xxxxxxxxx> 写道：
> 在 2011年11月26日 下午10:00，James Bottomley <jbottomley@xxxxxxxxxxxxx> 写道：
>> On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote:
>>> On Nov 24 Huajun Li wrote:
>>> >   While unplugging usb disk, scsi_disk(disk)->device  may be released
>>> > before sd_revalidate_disk() is called, then there will occur Oops as
>>> > shown below:
>>> [...]
>>> > --- a/drivers/scsi/sd.c
>>> > +++ b/drivers/scsi/sd.c
>>> > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct
>>> > scsi_device *sdp)
>>> >  static int sd_revalidate_disk(struct gendisk *disk)
>>> >  {
>>> >     struct scsi_disk *sdkp = scsi_disk(disk);
>>> > -   struct scsi_device *sdp = sdkp->device;
>>> > +   struct scsi_device *sdp;
>>> >     unsigned char *buffer;
>>> >     unsigned flush = 0;
>>> >
>>> > +   if (sdkp)
>>> > +           sdp = sdkp->device;
>>> > +   else
>>> > +           goto out;
>>> > +
>>> >     SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
>>> >                                   "sd_revalidate_disk\n"));
>>> >
>>>
>>> Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld]
>>> stack be structured along the lines that lower-level device instances live
>>> as long as higher levels rely on them?
>>
>> Not really, no.  The problem is the lifetime rules are complex.  The
>> lowest level objects actually live the longest because they're first to
>> be discovered before we know what higher level functions to attach.
>>
>> That's why you see complex paired gets in things like scsi_disk_get
>> because we try to get references both to the sdkp and the underlying
>> sdp ... so the sdkp can be torn down by last reference release from
>> either above or below (this is the menace of hot unplug).
>>
>>> For about a year now or so, I am seeing patches floating by that add NULL
>>> pointer checks here and there (or patches that clear pointers), and every
>>> time I wonder
>>>   - where else such NULL pointer checks might be needed,
>>>   - in what way (if at all) it is ensured that a function which is made to
>>>     check for a valid pointer at the top does not race with pointer
>>>     invalidation further down the road.
>>
>> I agree.  The patch is clearly wrong because sdkp is a refcounted object
>> that never actually sets sdkp->device to NULL.  If we find a NULL in
>> there it must be because the sdkp object is wrong.  The implication from
>> the trace seems to be that something allowed blkid to open a non
>> existent device.
>>
>> James
>>
>>
>
> Thanks for your response.
>
> Yes, in this case, actually sdkp is NULL rather than sdkp->device, and
> the patch indicates this.
> However, there is typo in my comments, maybe it misleads you, sorry!
> In fact, the comment should be:
>
> " While unplugging usb disk, scsi_disk(disk)  may be released
>  before sd_revalidate_disk() is called, then there will occur Oops."
>

BTW, after applied the patch and repeatedly plug/unplug the USB stick,
I did not see this crash.

However, the other concern is, need we validate scsi_disk(disk) in
some other functions specified in sd_fops ?  since they may be also
called from other layer after  scsi_disk(disk) is released.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html