On Tue, Mar 01, 2016 at 02:09:11AM -0500, Matthew Wilcox wrote: > > There are a few issues around 1GB THP support that I've come up against > while working on DAX support that I think may be interesting to discuss > in person. > > - Do we want to add support for 1GB THP for anonymous pages? DAX support > is driving the initial 1GB THP support, but would anonymous VMAs also > benefit from 1GB support? I'm not volunteering to do this work, but > it might make an interesting conversation if we can identify some users > who think performance would be better if they had 1GB THP support. At this point I don't think it would have much users. Too much hussle with non-obvious benefits. > - Latency of a major page fault. According to various public reviews, > main memory bandwidth is about 30GB/s on a Core i7-5960X with 4 > DDR4 channels. I think people are probably fairly unhappy about > doing only 30 page faults per second. So maybe we need a more complex > scheme to handle major faults where we insert a temporary 2MB mapping, > prepare the other 2MB pages in the background, then merge them into > a 1GB mapping when they're completed. > > - Cache pressure from 1GB page support. If we're using NT stores, they > bypass the cache, and all should be good. But if there are > architectures that support THP and not NT stores, zeroing a page is > just going to obliterate their caches. At some point I've tested NT stores for clearing 2M THP and it didn't show much benefit. I guess that could depend on microarhitecture and we probably should re-test this we new CPU generations. > Other topics that might interest people from a VM/FS point of view: > > - Uses for (or replacement of) the radix tree. We're currently > looking at using the radix tree with DAX in order to reduce the number > of calls into the filesystem. That's leading to various enhancements > to the radix tree, such as support for a lock bit for exceptional > entries (Neil Brown), and support for multi-order entries (me). > Is the (enhanced) radix tree the right data structure to be using > for this brave new world of huge pages in the page cache, or should > we be looking at some other data structure like an RB-tree? I'm interested in multi-order entires for THP page cache. It's not required for hugetmpfs, but would be nice to have. > > - Can we get rid of PAGE_CACHE_SIZE now? Finally? Pretty please? +1 :) -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html