On 07/30/23 20:51, Xueshi Hu wrote: > In set_nr_huge_pages(), local variable "count" is used to record > persistent_huge_pages(), but when it cames to nodes huge page allocation, > the semantics changes to nr_huge_pages. When there exists surplus huge > pages and using the interface under > /sys/devices/system/node/node*/hugepages to change huge page pool size, > this difference can result in the allocation of an unexpected number of > huge pages. > > Steps to reproduce the bug: > > Starting with: > > Node 0 Node 1 Total > HugePages_Total 0.00 0.00 0.00 > HugePages_Free 0.00 0.00 0.00 > HugePages_Surp 0.00 0.00 0.00 > > create 100 huge pages in Node 0 and consume it, then set Node 0 's > nr_hugepages to 0. > > yields: > > Node 0 Node 1 Total > HugePages_Total 200.00 0.00 200.00 > HugePages_Free 0.00 0.00 0.00 > HugePages_Surp 200.00 0.00 200.00 > > write 100 to Node 1's nr_hugepages > > echo 100 > /sys/devices/system/node/node1/\ > hugepages/hugepages-2048kB/nr_hugepages > > gets: > > Node 0 Node 1 Total > HugePages_Total 200.00 400.00 600.00 > HugePages_Free 0.00 400.00 400.00 > HugePages_Surp 200.00 0.00 200.00 > > Kernel is expected to create only 100 huge pages and it gives 200. > > Signed-off-by: Xueshi Hu <xueshi.hu@xxxxxxxxxx> > --- > mm/hugetlb.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Good catch! I added the code modified in this patch with commit fd875dca7c717. However, my commit moved the specific line that was the root case of the bug. That specific line was added with the commit 9a30523066cde that added hugetlb node specific support way back in 2009 (2.6.32 timeframe). Fix looks good, but waiting on resolution of max_huge_pages usage. -- Mike Kravetz > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 56647235ab21..8ed4fffdebda 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3490,7 +3490,9 @@ static int set_nr_huge_pages(struct hstate *h, unsigned long count, int nid, > if (nid != NUMA_NO_NODE) { > unsigned long old_count = count; > > - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + count += persistent_huge_pages(h) - > + (h->nr_huge_pages_node[nid] - > + h->surplus_huge_pages_node[nid]); > /* > * User may have specified a large count value which caused the > * above calculation to overflow. In this case, they wanted > -- > 2.40.1 > >