At 2024-05-31 18:13:03, "Baolin Wang" <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > >On 2024/5/31 17:35, David Hildenbrand wrote: >> On 30.05.24 04:04, Baolin Wang wrote: >>> Anonymous pages have already been supported for multi-size (mTHP) >>> allocation >>> through commit 19eaf44954df, that can allow THP to be configured >>> through the >>> sysfs interface located at >>> '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'. >>> >>> However, the anonymous shmem will ignore the anonymous mTHP rule >>> configured >>> through the sysfs interface, and can only use the PMD-mapped THP, that >>> is not >>> reasonable. Many implement anonymous page sharing through >>> mmap(MAP_SHARED | >>> MAP_ANONYMOUS), especially in database usage scenarios, therefore, >>> users expect >>> to apply an unified mTHP strategy for anonymous pages, also including the >>> anonymous shared pages, in order to enjoy the benefits of mTHP. For >>> example, >>> lower latency than PMD-mapped THP, smaller memory bloat than >>> PMD-mapped THP, >>> contiguous PTEs on ARM architecture to reduce TLB miss etc. >>> >>> The primary strategy is similar to supporting anonymous mTHP. Introduce >>> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled', >>> which can have all the same values as the top-level >>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new >>> additional "inherit" option. By default all sizes will be set to "never" >>> except PMD size, which is set to "inherit". This ensures backward >>> compatibility >>> with the anonymous shmem enabled of the top level, meanwhile also allows >>> independent control of anonymous shmem enabled for each mTHP. >>> >>> Use the page fault latency tool to measure the performance of 1G >>> anonymous shmem >>> with 32 threads on my machine environment with: ARM64 Architecture, 32 >>> cores, >>> 125G memory: >>> base: mm-unstable >>> user-time sys_time faults_per_sec_per_cpu faults_per_sec >>> 0.04s 3.10s 83516.416 2669684.890 >>> >>> mm-unstable + patchset, anon shmem mTHP disabled >>> user-time sys_time faults_per_sec_per_cpu faults_per_sec >>> 0.02s 3.14s 82936.359 2630746.027 >>> >>> mm-unstable + patchset, anon shmem 64K mTHP enabled >>> user-time sys_time faults_per_sec_per_cpu faults_per_sec >>> 0.08s 0.31s 678630.231 17082522.495 >>> >>> From the data above, it is observed that the patchset has a minimal >>> impact when >>> mTHP is not enabled (some fluctuations observed during testing). When >>> enabling 64K >>> mTHP, there is a significant improvement of the page fault latency. >> >> Let me summarize the takeaway from the bi-weekly MM meeting as I >> understood it, that includes Hugh's feedback on per-block tracking vs. > >Thanks David for the summarization. > >> mTHP: >> >> (1) Per-block tracking >> >> Per-block tracking is currently considered unwarranted complexity in >> shmem.c. We should try to get it done without that. For any test cases >> that fail, we should consider if they are actually valid for shmem. >> >> To optimize FALLOC_FL_PUNCH_HOLE for the cases where splitting+freeing >> is not possible at fallcoate() time, detecting zeropages later and >> retrying to split+free might be an option, without per-block tracking. >> >> (2) mTHP controls >> >> As a default, we should not be using large folios / mTHP for any shmem, >> just like we did with THP via shmem_enabled. This is what this series >> currently does, and is aprt of the whole mTHP user-space interface design. >> >> Further, the mTHP controls should control all of shmem, not only >> "anonymous shmem". > >Yes, that's what I thought and in my TODO list. > >> >> Also, we should properly fallback within the configured sizes, and not >> jump "over" configured sizes. Unless there is a good reason. >> >> (3) khugepaged >> >> khugepaged needs to handle larger folios properly as well. Until fixed, >> using smaller THP sizes as fallback might prohibit collapsing a >> PMD-sized THP later. But really, khugepaged needs to be fixed to handle >> that. > >> (4) force/disable >> >> These settings are rather testing artifacts from the old ages. We should >> not add them to the per-size toggles. We might "inherit" it from the >> global one, though. > >Sorry, I missed this. So I thould remove the 'force' and 'deny' option >for each mTHP, right? >I prefer to this. Perhaps the functionality of "force/deny" is different fromthat of "always/never" when tmpfs is supported. The user needs tounderstand the usage of "force" and "deny" again.