On Wed, 13 Apr 2022 14:27:54 +0800 "liupeng (DM)" <liupeng256@xxxxxxxxxx> wrote: > > On 2022/4/13 12:42, Andrew Morton wrote: > > On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu<liupeng256@xxxxxxxxxx> wrote: > > > >> Certain systems are designed to have sparse/discontiguous nodes. In > >> this case, nr_online_nodes can not be used to walk through numa node. > >> Also, a valid node may be greater than nr_online_nodes. > >> > >> However, in hugetlb, it is assumed that nodes are contiguous. Recheck > >> all the places that use nr_online_nodes, and repair them one by one. > >> > > What are the runtime effects of this shortcoming? > > . > > For sparse/discontiguous nodes, the current code may treat a valid node > as invalid, and will fail to allocate all hugepages on a valid node that > "nid >= nr_online_nodes". > > As David suggested: > if (tmp >= nr_online_nodes) > goto invalid; > > Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming > that "node < 2" is valid is wrong. So do you think we should backport thtis fix into earlier kernel releases?