Re: [PATCH v3 3/5] rbd: fix use-after free of rbd_dev->disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/09/2013 11:15 AM, Josh Durgin wrote:
> On 09/09/2013 06:37 AM, Alex Elder wrote:
>> On 09/09/2013 02:17 AM, Josh Durgin wrote:
>>> Removing a device deallocates the disk, unschedules the watch, and
>>> finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
>>> from the watch callback, updates the disk size and rbd_dev
>>> structure. With no locking between them, rbd_dev_refresh() may use the
>>> device or rbd_dev after they've been freed.

. . .

>>> +    if (!removing) {
>>
>> Here's the problem.  It is the same one faced by the open path.
>>
>> You determined above whether or not the device was getting removed.
>> But you haven't left any state that indicates that the image is
>> getting refreshed (or more specifically, getting its size updated).
>>
>> So, if the device is mapped but isn't actually opened by anybody
>> there's nothing stopping the code in rbd_remove() from going
>> ahead anyway.  So the possibility of the structures getting freed
>> remains (though the window is a little smaller now).
> 
> I think this is safe with the use of ceph_osdc_flush_notifies()
> before rbd_bus_del_dev(). Since the watch is already unregistered,
> flushing the notifies waits for all calls to rbd_watch_cb() to
> complete. This means that rbd_remove() can't actually free
> any of the rbd_dev structures until after the last revalidate_disk()
> etc. is done. Does this make sense, or am I missing something?

Yes, now I remember.  You're right.  That was the point of
doing the flush_notifies call before release anyway...

So my "nothing stopping the code in rbd_remove" was wrong.
In fact, that ceph_osdc_flush_notifies() call is stopping
it from freeing things that may still be in use.

Thanks for explaining it to me (again).

> This should probably go in a comment in rbd_remove(), since
> it's not obvious from the code there why the ordering of
> the last few calls makes sense.

Yes.  I think so.

Reviewed-by: Alex Elder <elder@xxxxxxxxxx>

> Josh
> 
>> I think you can resolve this problem by using the open count.
>> That is, use the same interlock between the open count and the
>> removing flag that's used for rbd_open() and rbd_remove().
>> The open count in some sense represents someone still needing
>> the data structures for the image.  So treat the refresh activity
>> as one of those "somebodies" by claiming one of those opens
>> while the size update is going on.
>>
>> If you encapsulated some code now in place for this purpose,
>> something like the following might be helpful.  (I've renamed
>> "open_count" to be "inuse_count" here.)
>>
>> static inline int rbd_request_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (!test_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags))
>>         rbd_dev->inuse_count++;
>>     else
>>         ret = -ENOENT;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> static inline int rbd_release_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (rbd_dev->inuse_count)
>>         rbd_dev->inuse_count--;
>>     else
>>         ret = -EINVAL;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> static inline int rbd_prevent_access(struct rbd_device *rbd_dev)
>> {
>>     int ret = 0;
>>
>>     spin_lock_irq(&rbd_dev->lock);
>>     if (rbd_dev->inuse_count)
>>         ret = -EBUSY;
>>     else if (test_and_set_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags))
>>         ret = -EINPROGRESS;
>>     spin_unlock_irq(&rbd_dev->lock);
>>
>>     return ret;
>> }
>>
>> I'll assume you know what I'm talking about at this point and
>> won't go into exactly where and how you'd use these.
>>
>>> +        size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
>>> +        dout("setting size to %llu sectors", (unsigned long long)size);
>>> +        set_capacity(rbd_dev->disk, size);
>>> +        revalidate_disk(rbd_dev->disk);
>>> +    }
>>> +}+-
>>> +
>>>   static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>>>   {
>>>       u64 mapping_size;
>>
>> Everything below is good.
>>
>>> @@ -3344,12 +3369,7 @@ static int rbd_dev_refresh(struct rbd_device
>>> *rbd_dev)
>>>       up_write(&rbd_dev->header_rwsem);
>>>
>>>       if (mapping_size != rbd_dev->mapping.size) {
>>> -        sector_t size;
>>> -
>>> -        size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
>>> -        dout("setting size to %llu sectors", (unsigned long long)size);
>>> -        set_capacity(rbd_dev->disk, size);
>>> -        revalidate_disk(rbd_dev->disk);
>>> +        rbd_dev_update_size(rbd_dev);
>>>       }
>>>
>>>       return ret;
>>> @@ -5160,7 +5180,6 @@ static ssize_t rbd_remove(struct bus_type *bus,
>>>       if (ret < 0 || already)
>>>           return ret;
>>>
>>> -    rbd_bus_del_dev(rbd_dev);
>>>       ret = rbd_dev_header_watch_sync(rbd_dev, false);
>>>       if (ret)
>>>           rbd_warn(rbd_dev, "failed to cancel watch event (%d)\n", ret);
>>> @@ -5171,6 +5190,7 @@ static ssize_t rbd_remove(struct bus_type *bus,
>>>        */
>>>       dout("%s: flushing notifies", __func__);
>>>       ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc);
>>> +    rbd_bus_del_dev(rbd_dev);
>>>       rbd_dev_image_release(rbd_dev);
>>>       module_put(THIS_MODULE);
>>>
>>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux