Re: [PATCH] SCSI/sd: Fix NULL dereference in sd_revalidate_disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2011年11月27日 上午10:38,James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> 写道:
> On Sun, 2011-11-27 at 10:20 +0800, Huajun Li wrote:
>> 在 2011年11月27日 上午9:28,Huajun Li <huajun.li.lee@xxxxxxxxx> 写道:
>> > 在 2011年11月26日 下午10:00,James Bottomley <jbottomley@xxxxxxxxxxxxx> 写道:
>> >> On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote:
>> >>> On Nov 24 Huajun Li wrote:
>> >>> >   While unplugging usb disk, scsi_disk(disk)->device  may be released
>> >>> > before sd_revalidate_disk() is called, then there will occur Oops as
>> >>> > shown below:
>> >>> [...]
>> >>> > --- a/drivers/scsi/sd.c
>> >>> > +++ b/drivers/scsi/sd.c
>> >>> > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct
>> >>> > scsi_device *sdp)
>> >>> >  static int sd_revalidate_disk(struct gendisk *disk)
>> >>> >  {
>> >>> >     struct scsi_disk *sdkp = scsi_disk(disk);
>> >>> > -   struct scsi_device *sdp = sdkp->device;
>> >>> > +   struct scsi_device *sdp;
>> >>> >     unsigned char *buffer;
>> >>> >     unsigned flush = 0;
>> >>> >
>> >>> > +   if (sdkp)
>> >>> > +           sdp = sdkp->device;
>> >>> > +   else
>> >>> > +           goto out;
>> >>> > +
>> >>> >     SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
>> >>> >                                   "sd_revalidate_disk\n"));
>> >>> >
>> >>>
>> >>> Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld]
>> >>> stack be structured along the lines that lower-level device instances live
>> >>> as long as higher levels rely on them?
>> >>
>> >> Not really, no.  The problem is the lifetime rules are complex.  The
>> >> lowest level objects actually live the longest because they're first to
>> >> be discovered before we know what higher level functions to attach.
>> >>
>> >> That's why you see complex paired gets in things like scsi_disk_get
>> >> because we try to get references both to the sdkp and the underlying
>> >> sdp ... so the sdkp can be torn down by last reference release from
>> >> either above or below (this is the menace of hot unplug).
>> >>
>> >>> For about a year now or so, I am seeing patches floating by that add NULL
>> >>> pointer checks here and there (or patches that clear pointers), and every
>> >>> time I wonder
>> >>>   - where else such NULL pointer checks might be needed,
>> >>>   - in what way (if at all) it is ensured that a function which is made to
>> >>>     check for a valid pointer at the top does not race with pointer
>> >>>     invalidation further down the road.
>> >>
>> >> I agree.  The patch is clearly wrong because sdkp is a refcounted object
>> >> that never actually sets sdkp->device to NULL.  If we find a NULL in
>> >> there it must be because the sdkp object is wrong.  The implication from
>> >> the trace seems to be that something allowed blkid to open a non
>> >> existent device.
>> >>
>> >> James
>> >>
>> >>
>> >
>> > Thanks for your response.
>> >
>> > Yes, in this case, actually sdkp is NULL rather than sdkp->device, and
>> > the patch indicates this.
>> > However, there is typo in my comments, maybe it misleads you, sorry!
>> > In fact, the comment should be:
>
> No, it's not that ... it's how did the open get far enough to do the
> revalidation?
>
>> > " While unplugging usb disk, scsi_disk(disk)  may be released
>> >  before sd_revalidate_disk() is called, then there will occur Oops."
>> >
>>
>> BTW, after applied the patch and repeatedly plug/unplug the USB stick,
>> I did not see this crash.
>>
>> However, the other concern is, need we validate scsi_disk(disk) in
>> some other functions specified in sd_fops ?  since they may be also
>> called from other layer after  scsi_disk(disk) is released.
>
> No, it's nothing to do with that.  To call any of the block operations,
> you need to call ->open successfully first.  since ->open either gets a
> ref to the sdkp or returns an error, how did we happen to be in
> sd_revalidate with a NULL sdkp?
>

Hi James,
Thanks for your hints.

Here is a case may cause sd_revalidate_disk() triggered with a NULL sdkp:
In sd_open(), if the removable medium does not present, it will call
scsi_disk_put(sdkp) and return -ENOMEDIUM,  then scsi_disk_put() may
cause scsi_disk_release() called and sdkp get freed.

In consequence, while __blkdev_get()  calling disk->fops->open(), it
will trigger rescan_partitions() if  ->open returns -ENOMEDIUM,
finally sd_revalidate_disk() will be called with a NULL sdkp.

BTW, I tried above path, it really could happen on my host. :)

> The call trace says we came through rescan_partitions() which only
> happens if ->open returns zero (or -ERESTARTSYS).
>

Not exactly. It happens if  ->open returns zero or -ENOMEDIUM, please
see this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1196f8b814f32cd04df334abf47648c2a9fd8324


Thanks,
--Huajun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux