On Mon 22-03-21 09:57:14, Mike Kravetz wrote: > On 3/22/21 6:59 AM, Michal Hocko wrote: > > On Fri 19-03-21 15:42:02, Mike Kravetz wrote: > >> The number of hugetlb pages can be adjusted by writing to the > >> sysps/proc files nr_hugepages, nr_hugepages_mempolicy or > >> nr_overcommit_hugepages. There is nothing to prevent two > >> concurrent modifications via these files. The underlying routine > >> set_max_huge_pages() makes assumptions that only one occurrence is > >> running at a time. Specifically, alloc_pool_huge_page uses a > >> hstate specific variable without any synchronization. > > > > From the above it is not really clear whether the unsynchronized nature > > of set_max_huge_pages is really a problem or a mere annoynce. I suspect > > the later because counters are properly synchronized with the > > hugetlb_lock. It would be great to clarify that. > > > > It is a problem and an annoyance. > > The problem is that alloc_pool_huge_page -> for_each_node_mask_to_alloc is > called after dropping the hugetlb lock. for_each_node_mask_to_alloc > uses the helper hstate_next_node_to_alloc which uses and modifies > h->next_nid_to_alloc. Worst case would be two instances of set_max_huge_pages > trying to allocate pages on different sets of nodes. Pages could get > allocated on the wrong nodes. Yes, what I meant by the annoyance. On the other hand a parallel access to a global knob mantaining a global resource should be expected to have some side effects without an external synchronization unless it is explicitly documented that such an access is synchronized internally. > I really doubt this problem has ever been experienced in practice. > However, when looking at the code in was a real annoyance. :) IMHO it would be a bit of a stretch to consider it a real life problem. > I'll update the commit message to be more clear. Thanks! Clarification will definitely help. -- Michal Hocko SUSE Labs