On 2022/4/14 6:04, Andrew Morton wrote:
On Wed, 13 Apr 2022 14:27:54 +0800 "liupeng (DM)" <liupeng256@xxxxxxxxxx> wrote:On 2022/4/13 12:42, Andrew Morton wrote:On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu<liupeng256@xxxxxxxxxx> wrote:Certain systems are designed to have sparse/discontiguous nodes. In this case, nr_online_nodes can not be used to walk through numa node. Also, a valid node may be greater than nr_online_nodes. However, in hugetlb, it is assumed that nodes are contiguous. Recheck all the places that use nr_online_nodes, and repair them one by one.What are the runtime effects of this shortcoming? .For sparse/discontiguous nodes, the current code may treat a valid node as invalid, and will fail to allocate all hugepages on a valid node that "nid >= nr_online_nodes". As David suggested: if (tmp >= nr_online_nodes) goto invalid; Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming that "node < 2" is valid is wrong.So do you think we should backport thtis fix into earlier kernel releases? .
I think it is not an urgent bug, because: 1) Qemu does not support sparse node so far, although there are some sparse-node issues to make qemu support sparse node. 2) I don't find an actual normal machine that reports sparse-node and need to use hugepages so far.