On 17/07/2024 11:25, David Hildenbrand wrote: > On 17.07.24 12:18, Ryan Roberts wrote: >> On 17/07/2024 11:03, David Hildenbrand wrote: >>>>>>>> >>>>>>>> But today, controls and stats are exposed for: >>>>>>>> >>>>>>>> anon: >>>>>>>> min order: 2 >>>>>>>> max order: PMD_ORDER >>>>>>>> anon-shmem: >>>>>>>> min order: 2 >>>>>>>> max order: PMD_ORDER >>>>>>>> tmpfs-shmem: >>>>>>>> min order: PMD_ORDER >>>>>>>> max order: PMD_ORDER >>>>>>>> file: >>>>>>>> min order: Nothing yet (this patch proposes 1) >>>>>>>> max order: Nothing yet (this patch proposes MAX_PAGECACHE_ORDER) >>>>>>>> >>>>>>>> So I think there is definitely a bug for shmem where the minimum order >>>>>>>> control >>>>>>>> should be order-1 but its currently order-2. >>>>>>> >>>>>>> Maybe, did not play with that yet. Likely order-1 will work. (although >>>>>>> probably >>>>>>> of questionable use :) ) >>>>>> >>>>>> You might have to expand on why its of "questionable use". I'd assume it has >>>>>> the >>>>>> same amount of value as using order-1 for regular page cache pages? i.e. half >>>>>> the number of objects to manage for the same amount of memory. >>>>> >>>>> order-1 was recently added for the pagecache to get some device setups running >>>>> (IIRC, where we cannot use order-0, because device blocksize > PAGE_SIZE). >>>>> >>>>> You might be right about "half the number of objects", but likely just >>>>> going for >>>>> order-2, order-3, order-4 ... for shmem might be even better. And simply >>>>> falling >>>>> back to order-0 when you cannot get the larger orders. >>>> >>>> Sure, but then you're into the territory of baking in policy. Remember that >>>> originally I was only interested in 64K but the concensus was to expose all the >>>> sizes. Same argument applies to 8K; expose it and let others decide policy. >>> >>> I don't disagree. The point I'm trying to make is that there was so far there >>> was no strong evidence that it is really required. Support for the pagecache had >>> a different motivation for these special devices. >> >> Sure, but there was no clear need for anon mTHP orders other than order-2 and >> order-4 (for arm64's HPA and contpte, respectively), but we still chose to >> expose all the others. > > order-2 and order-3 are valuable for AMD EPYC (depending on the generation 16 > vs. 32 KiB coalescing). > > But in general, at least for me, it's easier to argue why larger orders make > more sense than very tiny ones. > > For example, order-5 can be mapped using cont-pte as well and you get roughly > half the memory allocation+page fault overhead compared to order-4. > > order-1 ? No TLB optimization at least on any current HW I know. I believe there are some variants of HPA that coalesce "up to" 4 pages, meaning 2 pages (or 3 or 4) could be coalesced into a single TLB entry. But I'm not 100% sure on that. > > But I believe we're in violent agreement here :) >