On Tue, Apr 5, 2022 at 11:34 AM Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > > > > On 4/4/22 17:31, Muchun Song wrote: > > On Mon, Apr 4, 2022 at 5:25 PM Anshuman Khandual > > <anshuman.khandual@xxxxxxx> wrote: > >> > >> Hello Muchun, > >> > >> On 3/31/22 12:26, Muchun Song wrote: > >>> The feature of minimizing overhead of struct page associated with each > >>> HugeTLB page aims to free its vmemmap pages (used as struct page) to > >>> save memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type). > >> > >> Enabling this feature saves us around 1.4/1.6 % memory but looking from > >> other way around, unavailability of vmemmap backing pages (~1.4GB) when > >> freeing up a corresponding HugeTLB page, could prevent ~1TB memory from > >> being used as normal page form (requiring their own struct pages), thus > >> forcing the HugeTLB page to remain as such ? Is not this problematic ? > >> > >> These additional 1TB memory in normal pages, from a HugeTLB dissolution > >> could have eased the system's memory pressure without this feature being > >> enabled. > > > > You are right. If the system is already under heavy memory pressure, it could > > prevent the user from freeing HugeTLB pages to the buddy allocator. If the > > HugeTLB page are allocated from non-movable zone, this scenario may be > > not problematic since once a HugeTLB page is freed, then the system will > > But how can even the first HugeTLB page be freed without vmemmmap which is > throttled due to lack of sufficient memory ? It's unfortunate, we're deadlocked and will have to try again later :-( > > > have memory to be allocated to be used as vmemmap pages, subsequent > > freeing of HugeTLB pages may be getting easier. However, if the HUgeTLB > > pages are allocated from the movable zone, then the thing becomes terrible, > > which is documented in Documentation/admin-guide/mm/memory-hotplug.rst. > > > > So there is a cmdline "hugetlb_free_vmemmap" to control if enabling this > > feature. The user should enable/disable this depending on their workload. > > Should there also be a sysfs interface for this knob as well ? Perhaps the > system usage might change on the way, without requiring a reboot. Yep. I'm working on this [1] and will cc you in the next version. [1] https://lore.kernel.org/all/20220330153745.20465-1-songmuchun@xxxxxxxxxxxxx/ Thanks.