Re: [PATCH v2 1/3] rbd: fix use-after free of rbd_dev->disk

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Mon, 09 Sep 2013 00:30:15 -0700

On 09/03/2013 05:41 AM, Alex Elder wrote:
On 08/29/2013 07:57 PM, Josh Durgin wrote:
Removing a device deallocates the disk, unschedules the watch, and
finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
from the watch callback, updates the disk size and rbd_dev
structure. With no locking between them, rbd_dev_refresh() may use the
device or rbd_dev after they've been freed.

To fix this, add a shutting_down flag and a mutex protecting it to rbd_dev.
Take the mutex and check whether the flag is set before using rbd_dev->disk.
Move this disk-updating to a separate function as well.

Fixes: http://tracker.ceph.com/issues/5636
Signed-off-by: Josh Durgin <josh.durgin@xxxxxxxxxxx>

A few comments below.  I don't think you need the shutting_down
flag after all.  If you disagree, say so.  Either way though,
this looks good to me.

You're right, the extra lock turned out to be unnecessary.
Holding during revalidate_disk() also created a lock inversion
(since rbd_open() is called with bdev->lock held), so I've reworked
this in v3.

Thanks for the reviews!
Josh

Reviewed-by: Alex Elder <elder@xxxxxxxxxx>

---
  drivers/block/rbd.c |   39 +++++++++++++++++++++++++++++++++------
  1 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fef3687..8ab3362b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -343,6 +343,9 @@ struct rbd_device {
  	struct ceph_osd_event   *watch_event;
  	struct rbd_obj_request	*watch_request;

+	bool			shutting_down;	/* rbd_remove() in progress */

You could use a new rbd_dev_flags value for this.

In fact, now that I'm looking at this, I think the REMOVING flag
is already sufficient to indicate that bit of state.  (Sorry I
didn't see this before.)

You would still want the mutex so the shutdown won't happen until
an underway size update completed.  (Or you could add another
UPDATING_SIZE flag, but I think the mutex is better in this case.)

+	struct mutex		shutdown_lock;	/* protects shutting_down */
+
  	struct rbd_spec		*parent_spec;
  	u64			parent_overlap;
  	atomic_t		parent_ref;
@@ -3324,6 +3327,24 @@ static void rbd_exists_validate(struct rbd_device *rbd_dev)
  		clear_bit(RBD_DEV_FLAG_EXISTS, &rbd_dev->flags);
  }

+static void rbd_dev_update_size(struct rbd_device *rbd_dev)
+{
+	sector_t size;
+
+	mutex_lock(&rbd_dev->shutdown_lock);
+	/*
+	 * If the device is being removed, rbd_dev->disk has
+	 * been destroyed, so don't try to update its size
+	 */
+	if (!rbd_dev->shutting_down) {
+		size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
+		dout("setting size to %llu sectors", (unsigned long long)size);
+		set_capacity(rbd_dev->disk, size);

Is it true you don't hit that locking problem because of the new mutex?

+		revalidate_disk(rbd_dev->disk);
+	}
+	mutex_unlock(&rbd_dev->shutdown_lock);
+}
+
  static int rbd_dev_refresh(struct rbd_device *rbd_dev)
  {
  	u64 mapping_size;
@@ -3343,12 +3364,7 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
  	up_write(&rbd_dev->header_rwsem);

  	if (mapping_size != rbd_dev->mapping.size) {
-		sector_t size;
-
-		size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
-		dout("setting size to %llu sectors", (unsigned long long)size);
-		set_capacity(rbd_dev->disk, size);
-		revalidate_disk(rbd_dev->disk);
+		rbd_dev_update_size(rbd_dev);
  	}

  	return ret;
@@ -3656,6 +3672,8 @@ static struct rbd_device *rbd_dev_create(struct rbd_client *rbdc,
  	atomic_set(&rbd_dev->parent_ref, 0);
  	INIT_LIST_HEAD(&rbd_dev->node);
  	init_rwsem(&rbd_dev->header_rwsem);
+	mutex_init(&rbd_dev->shutdown_lock);
+	rbd_dev->shutting_down = false;

  	rbd_dev->spec = spec;
  	rbd_dev->rbd_client = rbdc;
@@ -5159,7 +5177,16 @@ static ssize_t rbd_remove(struct bus_type *bus,
  	if (ret < 0 || already)
  		return ret;

+	/*
+	 * hold shutdown_lock while destroying the device so that
+	 * device destruction will not race with device updates from
+	 * the watch callback
+	 */
+	mutex_lock(&rbd_dev->shutdown_lock);
+	rbd_dev->shutting_down = true;
  	rbd_bus_del_dev(rbd_dev);
+	mutex_unlock(&rbd_dev->shutdown_lock);
+
  	ret = rbd_dev_header_watch_sync(rbd_dev, false);
  	if (ret)
  		rbd_warn(rbd_dev, "failed to cancel watch event (%d)\n", ret);



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html