On 3/5/20 10:36 PM, Longpeng (Mike) wrote: > 在 2020/3/6 8:09, Mike Kravetz 写道: >> On 3/4/20 7:30 PM, Longpeng(Mike) wrote: >>> From: Longpeng <longpeng2@xxxxxxxxxx> > >> I am thinking we may want to have a more generic solution by allowing >> the default_hugepagesz= processing code to verify the passed size and >> set up the corresponding hstate. This would require more cooperation >> between architecture specific and independent code. This could be >> accomplished with a simple arch_hugetlb_valid_size() routine provided >> by the architectures. Below is an untested patch to add such support >> to the architecture independent code and x86. Other architectures would >> be similar. >> >> In addition, with architectures providing arch_hugetlb_valid_size() it >> should be possible to have a common routine in architecture independent >> code to read/process hugepagesz= command line arguments. >> > I just want to use the minimize changes to address this issue, so I choosed a > way which my patch did. > > To be honest, the approach you suggested above is much better though it need > more changes. > >> Of course, another approach would be to simply require ALL architectures >> to set up hstates for ALL supported huge page sizes. >> > I think this is also needed, then we can request all supported size of hugepages > by sysfs(e.g. /sys/kernel/mm/hugepages/*) dynamically. Currently, (x86) we can > only request 1G-hugepage through sysfs if we boot with 'default_hugepagesz=1G', > even with the first approach. I 'think' you can use sysfs for 1G huge pages on x86 today. Just booted a system without any hugepage options on the command line. # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages 0 # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/^Cugepages # echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages 1 # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages 1 x86 and riscv will set up hstates for PUD_SIZE hstates by default if CONFIG_CONTIG_ALLOC. This is because of a somewhat recent feature that allowed dynamic allocation of gigantic (page order >= MAX_ORDER) pages. Before that feature, it made no sense to set up an hstate for gigantic pages if they were not allocated at boot time and could not be dynamically added later. I'll code up a proposal that does the following: - Have arch specific code provide a list of supported huge page sizes - Arch independent code uses list to create all hstates - Move processing of "hugepagesz=" to arch independent code - Validate "default_hugepagesz=" when value is read from command line It make take a few days. When ready, I will pull in the architecture specific people. > BTW, because it's not easy to discuss with you due to the time difference, I > have another question about the default hugepages to consult you here. Why the > /proc/meminfo only show the info about the default hugepages, but not others? > meminfo is more well know than sysfs, some ordinary users know meminfo but don't > know use the sysfs to get the hugepages status(e.g. total, free). I believe that is simply history. In the beginning there was only the default huge page size and that was added to meminfo. People then wrote scripts to parse huge page information in meminfo. When support for other huge pages was added, it was not added to meminfo as it could break user scripts parsing the file. Adding information for all potential huge page sizes may create lots of entries that are unused. I was not around when these decisions were made, but that is my understanding. BTW - A recently added meminfo field 'Hugetlb' displays the amount of memory consumed by huge pages of ALL sizes. -- Mike Kravetz