On 2021/5/13 17:54, Muchun Song wrote: > On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: >> >> On 2021/5/13 14:48, Huang Ying wrote: >>> Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array >>> accesses to avoid NULL derefs"), the typical code to reference the >>> swap_info[] is as follows, >>> >>> type = swp_type(swp_entry); >>> if (type >= nr_swapfiles) >>> /* handle invalid swp_entry */; >>> p = swap_info[type]; >>> /* access fields of *p. OOPS! p may be NULL! */ >>> >>> Because the ordering isn't guaranteed, it's possible that "p" is read >>> before checking "type". And that may result in NULL pointer >>> dereference. >>> >>> So in commit c10d38cc8d3e, the code becomes, >>> >>> struct swap_info_struct *swap_type_to_swap_info(int type) >>> { >>> if (type >= READ_ONCE(nr_swapfiles)) >>> return NULL; >>> smp_rmb(); >>> return READ_ONCE(swap_info[type]); >>> } >>> >>> /* users */ >>> type = swp_type(swp_entry); >>> p = swap_type_to_swap_info(type); >>> if (!p) >>> /* handle invalid swp_entry */; >>> /* access fields of *p */ >>> >>> Because "p" is checked to be non-zero before dereference, smp_rmb() >>> isn't needed anymore. >>> >>> We still need to guarantee swap_info[type] is read before dereference. >>> That can be satisfied via the data dependency ordering of >>> READ_ONCE(swap_info[type]). The corresponding smp_wmb() is adjusted >>> in alloc_swap_info() too. >>> >>> And, we don't need to read "nr_swapfiles" too. Because if >>> "type >= nr_swapfiles", swap_info[type] will be NULL. We just need >>> to make sure we will not access out of the boundary of the array. >>> With that change, nr_swapfiles will only be accessed with swap_lock >>> held, except in swapcache_free_entries(). Where the absolute >>> correctness of the value isn't needed, as described in the comments. >>> >>> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >>> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> >>> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx> >>> Cc: Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >>> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> >>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >>> Cc: Omar Sandoval <osandov@xxxxxx> >>> Cc: Paul McKenney <paulmck@xxxxxxxxxx> >>> Cc: Tejun Heo <tj@xxxxxxxxxx> >>> Cc: Will Deacon <will.deacon@xxxxxxx> >>> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx> >>> --- >>> mm/swapfile.c | 18 +++++++++--------- >>> 1 file changed, 9 insertions(+), 9 deletions(-) >>> >>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>> index 2aad85751991..4c1fb28bbe0e 100644 >>> --- a/mm/swapfile.c >>> +++ b/mm/swapfile.c >>> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0); >>> >>> static struct swap_info_struct *swap_type_to_swap_info(int type) >>> { >>> - if (type >= READ_ONCE(nr_swapfiles)) >>> + if (type >= MAX_SWAPFILES) >>> return NULL; >>> >>> - smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */ >>> + /* >>> + * The data dependency ordering from the READ_ONCE() pairs >>> + * with smp_wmb() in alloc_swap_info() to guarantee the >>> + * swap_info_struct fields are read after swap_info[type]. >>> + */ >>> return READ_ONCE(swap_info[type]); >>> } >>> >>> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void) >>> } >>> if (type >= nr_swapfiles) { >>> p->type = type; >>> - WRITE_ONCE(swap_info[type], p); >>> - /* >>> - * Write swap_info[type] before nr_swapfiles, in case a >>> - * racing procfs swap_start() or swap_next() is reading them. >>> - * (We never shrink nr_swapfiles, we never free this entry.) >>> - */ >>> + /* Paired with READ_ONCE() in swap_type_to_swap_info() */ >>> smp_wmb(); >> >> Many thank for your patch. The patch looks fine to me. There is one question: >> >> There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ? >> Could you please have a explanation ? > > The comment is very clear, it matches READ_ONCE() which implies a > data dependence barrier on some archs. > > Thanks. Got it. I misunderstood it... Thanks! > >> >> Thanks again! >> >>> - WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1); >>> + WRITE_ONCE(swap_info[type], p); >>> + nr_swapfiles++; >>> } else { >>> defer = p; >>> p = swap_info[type]; >>> >> > . >