On Wed, 11 Jul 2018, Andrew Morton wrote: > > > Did you consider LRU-sorting the array instead? > > > > > > > It adds 40 bytes to struct task_struct, > > What does? LRU sort? It's a 4-entry array, just do it in place, like > bh_lru_install(). Confused. > I was imagining an optimized sort rather than adding an iteration to vmacache_update() of the same form that causes vmacache_find() to show up on my perf reports in the first place. > > but I'm not sure the least > > recently used is the first preferred check. If I do > > madvise(MADV_DONTNEED) from a malloc implementation where I don't control > > what is free()'d and I'm constantly freeing back to the same hugepages, > > for example, I may always get first slot cache hits with this patch as > > opposed to the 25% chance that the current implementation has (and perhaps > > an lru would as well). > > > > I'm sure that I could construct a workload where LRU would be better and > > could show that the added footprint were worthwhile, but I could also > > construct a workload where the current implementation based on pfn would > > outperform all of these. It simply turns out that on the user-controlled > > workloads that I was profiling that hashing based on pmd was the win. > > That leaves us nowhere to go. Zapping the WARN_ON seems a no-brainer > though? > I would suggest it goes under CONFIG_DEBUG_VM_VMACACHE. My implementation for the optimized vmacache_find() is based on the premise that spatial locality matters, and in practice on random user-controlled workloads this yields a faster lookup than the current implementation. Of course, any caching technique can be defeated by workloads, artifical or otherwise, but I suggest that as a general principle caching based on PMD_SHIFT rather than pfn has a greater likelihood of avoiding the iteration in vmacache_find() because of spatial locality for anything that iterates over a range of memory.