Hi Paul, Paul E. McKenney <paulmck@xxxxxxxxxx> 于2020年1月1日周三 上午7:13写道: > > On Tue, Dec 31, 2019 at 10:55:47PM +0800, Hou Tao wrote: > > Hi, > > > > On 2019/12/31 19:09, Yufen Yu wrote: > > > When delete partition executes concurrently with IOs issue, > > > it may cause use-after-free on part in disk_map_sector_rcu() > > > as following: > > snip > > > > > > > > diff --git a/block/genhd.c b/block/genhd.c > > > index ff6268970ddc..39fa8999905f 100644 > > > --- a/block/genhd.c > > > +++ b/block/genhd.c > > > @@ -293,7 +293,23 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) > > > part = rcu_dereference(ptbl->part[i]); > > > > > > if (part && sector_in_part(part, sector)) { > > snip > > > > > rcu_assign_pointer(ptbl->last_lookup, part); > > > + part = rcu_dereference(ptbl->part[i]); > > > + if (part == NULL) { > > > + rcu_assign_pointer(ptbl->last_lookup, NULL); > > > + break; > > > + } > > > return part; > > > } > > > } > > > > Not ensure whether the re-read can handle the following case or not: > > > > process A process B process C > > > > disk_map_sector_rcu(): delete_partition(): disk_map_sector_rcu(): > > > > rcu_read_lock > > > > // need to iterate partition table > > part[i] != NULL (1) part[i] = NULL (2) > > smp_mb() > > last_lookup = NULL (3) > > call_rcu() (4) > > last_lookup = part[i] (5) > > > > > > rcu_read_lock() > > read last_lookup return part[i] (6) > > sector_in_part() is OK (7) > > return part[i] (8) > > Just for the record... > > Use of RCU needs to ensure that readers cannot access the to-be-freed > structure -before- invoking call_rcu(). Which does look to happen here > with the "last_lookup = NULL". But the last_lookup pointer may be reassigned a to-be-freed pointer (namely part[i]) as show in process A, and I think the RCU read section starts after the call of call_rcu() may read the reassigned pointer and use it, right ? > But in addition, the callback needs to > get access to the to-be-freed structure via some sideband (usually the > structure passed to call_rcu()), not from the reader-accessible structure. > Er, could you clarify the meaning of "reader-accessible structure" ? Do you mean "reader-writable structure" like last_lookup ? In the case, the callback does access the to-be-freed struct by the embedded "rcu_work". > Or am I misinterpreting this sequence of events? > Ah, I over simplify the procedures happened in process B. Does the following diagram make it clearer ? process A process B process C disk_map_sector_rcu(): delete_partition(): disk_map_sector_rcu(): rcu_read_lock // iterate part table part[i] != NULL (1) deleted = part[i] part[i] = NULL (2) smp_mb() last_lookup = NULL (3) percpu_ref_kill(deleted) // last ref __delete_partition (4) queue_rcu_work // re-assign last_lookup last_lookup = part[i] (5) rcu_read_lock() // hit last_lookup read last_lookup returns part[i] (6) sector_in_part() is OK (7) will return part[i] (8) rcu_read_unlock() // found part[i] is NULL // reassign last_lookup last_lookup = NULL will return part0 rcu_read_unlock // RCU callback delete_partition_work_fn free deleted // use-after-free ? hd_struct_try_get Regards, Tao > Thanx, Paul > > > part[i] == NULL (9) > > last_lookup = NULL (10) > > rcu_read_unlock() (11) > > one RCU grace period completes > > __delete_partition() (12) > > free hd_partition (13) > > // use-after-free > > hd_struct_try_get(part[i]) (14) > > > > * the number in the parenthesis is the sequence of events. > > > > Maybe RCU experts can shed some light on this problem, so cc +paulmck@xxxxxxxxxx, +joel@xxxxxxxxxxxxxxxxx and +RCU maillist. > > > > If the above case is possible, maybe we can fix the problem by pinning last_lookup through increasing its ref-count > > (the following patch is only compile tested): > > > > diff --git a/block/genhd.c b/block/genhd.c > > index 6e8543ca6912..179e0056fae1 100644 > > --- a/block/genhd.c > > +++ b/block/genhd.c > > @@ -279,7 +279,14 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) > > part = rcu_dereference(ptbl->part[i]); > > > > if (part && sector_in_part(part, sector)) { > > - rcu_assign_pointer(ptbl->last_lookup, part); > > + struct hd_struct *old; > > + > > + if (!hd_struct_try_get(part)) > > + break; > > + > > + old = xchg(&ptbl->last_lookup, part); > > + if (old) > > + hd_struct_put(old); > > return part; > > } > > } > > @@ -1231,7 +1238,11 @@ static void disk_replace_part_tbl(struct gendisk *disk, > > rcu_assign_pointer(disk->part_tbl, new_ptbl); > > > > if (old_ptbl) { > > - rcu_assign_pointer(old_ptbl->last_lookup, NULL); > > + struct hd_struct *part; > > + > > + part = xchg(&old_ptbl->last_lookup, NULL); > > + if (part) > > + hd_struct_put(part); > > kfree_rcu(old_ptbl, rcu_head); > > } > > } > > diff --git a/block/partition-generic.c b/block/partition-generic.c > > index 98d60a59b843..441c1c591c04 100644 > > --- a/block/partition-generic.c > > +++ b/block/partition-generic.c > > @@ -285,7 +285,8 @@ void delete_partition(struct gendisk *disk, int partno) > > return; > > > > rcu_assign_pointer(ptbl->part[partno], NULL); > > - rcu_assign_pointer(ptbl->last_lookup, NULL); > > + if (cmpxchg(&ptbl->last_lookup, part, NULL) == part) > > + hd_struct_put(part); > > kobject_put(part->holder_dir); > > device_del(part_to_dev(part)); > > > > -- > > 2.22.0 > > > > Regards, > > Tao > > > > > > > diff --git a/block/partition-generic.c b/block/partition-generic.c > > > index 1d20c9cf213f..1e0065ed6f02 100644 > > > --- a/block/partition-generic.c > > > +++ b/block/partition-generic.c > > > @@ -284,6 +284,13 @@ void delete_partition(struct gendisk *disk, int partno) > > > return; > > > > > > rcu_assign_pointer(ptbl->part[partno], NULL); > > > + /* > > > + * Without the memory barrier, disk_map_sector_rcu() > > > + * may read the old value after overwriting the > > > + * last_lookup. Then it can not clear last_lookup, > > > + * which may cause use-after-free. > > > + */ > > > + smp_mb(); > > > rcu_assign_pointer(ptbl->last_lookup, NULL); > > > kobject_put(part->holder_dir); > > > device_del(part_to_dev(part)); > > > > >