>> >> Let's assume a 4 TiB device and 2 MiB hugepage size. That's 2097152 huge >> pages. Each such PMD entry consumes 8 bytes. That's 16 MiB. >> >> Sure, with thousands of processes sharing that memory, the size of page >> tables required would increase with each and every process. But TBH, >> that's in no way different to other file systems where we're even >> dealing with PTE tables. > > The numbers for a real use case I am frequently quoted are something like: > 1TB shared mapping, 10,000 processes sharing the mapping > 4K PMD Page per 1GB of shared mapping > 4M saving for each shared process > 9,999 * 4M ~= 39GB savings 3.7 % of all memory. Noticeable if the feature is removed? yes. Do we care about supporting such corner cases that result in a maintenance burden? My take is a clear no. > > However, if you look at commit 39dde65c9940c which introduced huge pmd sharing > it states that performance rather than memory savings was the primary > objective. > > "For hugetlb, the saving on page table memory is not the primary > objective (as hugetlb itself already cuts down page table overhead > significantly), instead, the purpose of using shared page table on hugetlb is > to allow faster TLB refill and smaller cache pollution upon TLB miss. > > With PT sharing, pte entries are shared among hundreds of processes, the > cache consumption used by all the page table is smaller and in return, > application gets much higher cache hit ratio. One other effect is that > cache hit ratio with hardware page walker hitting on pte in cache will be > higher and this helps to reduce tlb miss latency. These two effects > contribute to higher application performance." > > That 'makes sense', but I have never tried to measure any such performance > benefit. It is easier to calculate the memory savings. It does makes sense; but then, again, what's specific here about hugetlb? Most probably it was just easy to add to hugetlb in contrast to other types of shared memory. > >> >> Which results in me wondering if >> >> a) We should simply use gigantic pages for such extreme use case. Allows >> for freeing up more memory via vmemmap either way. > > The only problem with this is that many processors in use today have > limited TLB entries for gigantic pages. > >> b) We should instead look into reclaiming reconstruct-able page table. >> It's hard to imagine that each and every process accesses each and >> every part of the gigantic file all of the time. >> c) We should instead establish a more generic page table sharing >> mechanism. > > Yes. I think that is the direction taken by mshare() proposal. If we have > a more generic approach we can certainly start deprecating hugetlb pmd > sharing. My strong opinion is to remove it ASAP and get something proper into place. -- Thanks, David / dhildenb