Re: [PATCH] block: make sure last_lookup set as NULL after part deleted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, it seems that i miss-formatted the diagram, so I resend it here:

process A                    process B            process C

disk_map_sector_rcu():       delete_partition():  disk_map_sector_rcu():

rcu_read_lock

  // iterate part table
  part[i] != NULL  (1)

                             deleted = part[i]
                             part[i] = NULL (2)
                             smp_mb()
                             last_lookup = NULL (3)

                             percpu_ref_kill(deleted)
                               // last ref
                               __delete_partition (4)
                                 queue_rcu_work

  // re-assign last_lookup
  last_lookup = part[i] (5)


                                                  rcu_read_lock()
                                                  // is it possible ?
                                                  last_lookup is part[i] (6)
                                                  sector_in_part() is OK (7)
                                                  will return part[i] (8)
                                                  rcu_read_unlock()

  // found part[i] is NULL
  // reassign last_lookup
  last_lookup = NULL
  will return part0
  rcu_read_unlock

                               // RCU callback
                               delete_partition_work_fn
                                 free deleted
                                                       // use-after-free ?
                                                       hd_struct_try_get

On Wed, Jan 1, 2020 at 10:33 AM htbegin <hotforest@xxxxxxxxx> wrote:
>
> Hi Paul,
>
> Paul E. McKenney <paulmck@xxxxxxxxxx> 于2020年1月1日周三 上午7:13写道:
> >
> > On Tue, Dec 31, 2019 at 10:55:47PM +0800, Hou Tao wrote:
> > > Hi,
> > >
> > > On 2019/12/31 19:09, Yufen Yu wrote:
> > > > When delete partition executes concurrently with IOs issue,
> > > > it may cause use-after-free on part in disk_map_sector_rcu()
> > > > as following:
> > > snip
> > >
> > > >
> > > > diff --git a/block/genhd.c b/block/genhd.c
> > > > index ff6268970ddc..39fa8999905f 100644
> > > > --- a/block/genhd.c
> > > > +++ b/block/genhd.c
> > > > @@ -293,7 +293,23 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
> > > >             part = rcu_dereference(ptbl->part[i]);
> > > >
> > > >             if (part && sector_in_part(part, sector)) {
> > > snip
> > >
> > > >                     rcu_assign_pointer(ptbl->last_lookup, part);
> > > > +                   part = rcu_dereference(ptbl->part[i]);
> > > > +                   if (part == NULL) {
> > > > +                           rcu_assign_pointer(ptbl->last_lookup, NULL);
> > > > +                           break;
> > > > +                   }
> > > >                     return part;
> > > >             }
> > > >     }
> > >
> > > Not ensure whether the re-read can handle the following case or not:
> > >
> > > process A                                 process B                          process C
> > >
> > > disk_map_sector_rcu():                    delete_partition():               disk_map_sector_rcu():
> > >
> > > rcu_read_lock
> > >
> > >   // need to iterate partition table
> > >   part[i] != NULL   (1)                   part[i] = NULL (2)
> > >                                           smp_mb()
> > >                                           last_lookup = NULL (3)
> > >                                           call_rcu()  (4)
> > >     last_lookup = part[i] (5)
> > >
> > >
> > >                                                                              rcu_read_lock()
> > >                                                                              read last_lookup return part[i] (6)
> > >                                                                              sector_in_part() is OK (7)
> > >                                                                              return part[i] (8)
> >
> > Just for the record...
> >
> > Use of RCU needs to ensure that readers cannot access the to-be-freed
> > structure -before- invoking call_rcu().  Which does look to happen here
> > with the "last_lookup = NULL".
>
> But the last_lookup pointer may be reassigned a to-be-freed pointer (namely
> part[i]) as show in process A, and I think the RCU read section starts after
> the call of call_rcu() may read the reassigned pointer and use it, right ?
>
> > But in addition, the callback needs to
> > get access to the to-be-freed structure via some sideband (usually the
> > structure passed to call_rcu()), not from the reader-accessible structure.
> >
> Er, could you clarify the meaning of "reader-accessible structure" ? Do you
> mean "reader-writable structure" like last_lookup ? In the case, the callback
> does access the to-be-freed struct by the embedded "rcu_work".
>
> > Or am I misinterpreting this sequence of events?
> >
> Ah, I over simplify the procedures happened in process B. Does the following
> diagram make it clearer ?
>
> process A                           process B                     process C
>
> disk_map_sector_rcu():              delete_partition():
> disk_map_sector_rcu():
>
> rcu_read_lock
>
>   // iterate part table
>   part[i] != NULL   (1)
>                                     deleted = part[i]
>                                     part[i] = NULL (2)
>                                     smp_mb()
>                                     last_lookup = NULL (3)
>
>                                     percpu_ref_kill(deleted)
>                                       // last ref
>                                       __delete_partition (4)
>                                         queue_rcu_work
>     // re-assign last_lookup
>     last_lookup = part[i] (5)
>
>
>
> rcu_read_lock()
>                                                                   //
> hit last_lookup
>                                                                   read
> last_lookup returns part[i] (6)
>
> sector_in_part() is OK (7)
>                                                                   will
> return part[i] (8)
>
> rcu_read_unlock()
>
>   // found part[i] is NULL
>   // reassign last_lookup
>   last_lookup = NULL
>   will return part0
>
>   rcu_read_unlock
>                                     // RCU callback
>                                     delete_partition_work_fn
>                                       free deleted
>                                                                    //
> use-after-free ?
>
> hd_struct_try_get
> Regards,
> Tao
>
>
> >                                                         Thanx, Paul
> >
> > >   part[i] == NULL (9)
> > >       last_lookup = NULL (10)
> > >   rcu_read_unlock() (11)
> > >                                            one RCU grace period completes
> > >                                            __delete_partition() (12)
> > >                                            free hd_partition (13)
> > >                                                                              // use-after-free
> > >                                                                              hd_struct_try_get(part[i])  (14)
> > >
> > > * the number in the parenthesis is the sequence of events.
> > >
> > > Maybe RCU experts can shed some light on this problem, so cc +paulmck@xxxxxxxxxx, +joel@xxxxxxxxxxxxxxxxx and +RCU maillist.
> > >
> > > If the above case is possible, maybe we can fix the problem by pinning last_lookup through increasing its ref-count
> > > (the following patch is only compile tested):
> > >
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index 6e8543ca6912..179e0056fae1 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -279,7 +279,14 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
> > >               part = rcu_dereference(ptbl->part[i]);
> > >
> > >               if (part && sector_in_part(part, sector)) {
> > > -                     rcu_assign_pointer(ptbl->last_lookup, part);
> > > +                     struct hd_struct *old;
> > > +
> > > +                     if (!hd_struct_try_get(part))
> > > +                             break;
> > > +
> > > +                     old = xchg(&ptbl->last_lookup, part);
> > > +                     if (old)
> > > +                             hd_struct_put(old);
> > >                       return part;
> > >               }
> > >       }
> > > @@ -1231,7 +1238,11 @@ static void disk_replace_part_tbl(struct gendisk *disk,
> > >       rcu_assign_pointer(disk->part_tbl, new_ptbl);
> > >
> > >       if (old_ptbl) {
> > > -             rcu_assign_pointer(old_ptbl->last_lookup, NULL);
> > > +             struct hd_struct *part;
> > > +
> > > +             part = xchg(&old_ptbl->last_lookup, NULL);
> > > +             if (part)
> > > +                     hd_struct_put(part);
> > >               kfree_rcu(old_ptbl, rcu_head);
> > >       }
> > >  }
> > > diff --git a/block/partition-generic.c b/block/partition-generic.c
> > > index 98d60a59b843..441c1c591c04 100644
> > > --- a/block/partition-generic.c
> > > +++ b/block/partition-generic.c
> > > @@ -285,7 +285,8 @@ void delete_partition(struct gendisk *disk, int partno)
> > >               return;
> > >
> > >       rcu_assign_pointer(ptbl->part[partno], NULL);
> > > -     rcu_assign_pointer(ptbl->last_lookup, NULL);
> > > +     if (cmpxchg(&ptbl->last_lookup, part, NULL) == part)
> > > +             hd_struct_put(part);
> > >       kobject_put(part->holder_dir);
> > >       device_del(part_to_dev(part));
> > >
> > > --
> > > 2.22.0
> > >
> > > Regards,
> > > Tao
> > >
> > >
> > > > diff --git a/block/partition-generic.c b/block/partition-generic.c
> > > > index 1d20c9cf213f..1e0065ed6f02 100644
> > > > --- a/block/partition-generic.c
> > > > +++ b/block/partition-generic.c
> > > > @@ -284,6 +284,13 @@ void delete_partition(struct gendisk *disk, int partno)
> > > >             return;
> > > >
> > > >     rcu_assign_pointer(ptbl->part[partno], NULL);
> > > > +   /*
> > > > +    * Without the memory barrier, disk_map_sector_rcu()
> > > > +    * may read the old value after overwriting the
> > > > +    * last_lookup. Then it can not clear last_lookup,
> > > > +    * which may cause use-after-free.
> > > > +    */
> > > > +   smp_mb();
> > > >     rcu_assign_pointer(ptbl->last_lookup, NULL);
> > > >     kobject_put(part->holder_dir);
> > > >     device_del(part_to_dev(part));
> > > >
> > >




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux