On 11/22/20 11:38 PM, Michal Hocko wrote: > On Fri 20-11-20 09:45:12, Mike Kravetz wrote: >> On 11/20/20 1:43 AM, David Hildenbrand wrote: > [...] >>>>> To keep things easy, maybe simply never allow to free these hugetlb pages >>>>> again for now? If they were reserved during boot and the vmemmap condensed, >>>>> then just let them stick around for all eternity. >>>> >>>> Not sure I understand. Do you propose to only free those vmemmap pages >>>> when the pool is initialized during boot time and never allow to free >>>> them up? That would certainly make it safer and maybe even simpler wrt >>>> implementation. >>> >>> Exactly, let's keep it simple for now. I guess most use cases of this (virtualization, databases, ...) will allocate hugepages during boot and never free them. >> >> Not sure if I agree with that last statement. Database and virtualization >> use cases from my employer allocate allocate hugetlb pages after boot. It >> is shortly after boot, but still not from boot/kernel command line. > > Is there any strong reason for that? > The reason I have been given is that it is preferable to have SW compute the number of needed huge pages after boot based on total memory, rather than have a sysadmin calculate the number and add a boot parameter. >> Somewhat related, but not exactly addressing this issue ... >> >> One idea discussed in a previous patch set was to disable PMD/huge page >> mapping of vmemmap if this feature was enabled. This would eliminate a bunch >> of the complex code doing page table manipulation. It does not address >> the issue of struct page pages going away which is being discussed here, >> but it could be a way to simply the first version of this code. If this >> is going to be an 'opt in' feature as previously suggested, then eliminating >> the PMD/huge page vmemmap mapping may be acceptable. My guess is that >> sysadmins would only 'opt in' if they expect most of system memory to be used >> by hugetlb pages. We certainly have database and virtualization use cases >> where this is true. > > Would this simplify the code considerably? I mean, the vmemmap page > tables will need to be updated anyway. So that code has to stay. PMD > entry split shouldn't be the most complex part of that operation. On > the other hand dropping large pages for all vmemmaps will likely have a > performance. I agree with your points. This was just one way in which the patch set could be simplified. -- Mike Kravetz