[LSF/MM TOPIC] Support for 1GB THP

Matthew Wilcox <willy@xxxxxxxxxxxxxxx> · Tue, 1 Mar 2016 02:09:11 -0500

There are a few issues around 1GB THP support that I've come up against
while working on DAX support that I think may be interesting to discuss
in person.

 - Do we want to add support for 1GB THP for anonymous pages?  DAX support
   is driving the initial 1GB THP support, but would anonymous VMAs also
   benefit from 1GB support?  I'm not volunteering to do this work, but
   it might make an interesting conversation if we can identify some users
   who think performance would be better if they had 1GB THP support.

 - Latency of a major page fault.  According to various public reviews,
   main memory bandwidth is about 30GB/s on a Core i7-5960X with 4
   DDR4 channels.  I think people are probably fairly unhappy about
   doing only 30 page faults per second.  So maybe we need a more complex
   scheme to handle major faults where we insert a temporary 2MB mapping,
   prepare the other 2MB pages in the background, then merge them into
   a 1GB mapping when they're completed.

 - Cache pressure from 1GB page support.  If we're using NT stores, they
   bypass the cache, and all should be good.  But if there are
   architectures that support THP and not NT stores, zeroing a page is
   just going to obliterate their caches.

Other topics that might interest people from a VM/FS point of view:

 - Uses for (or replacement of) the radix tree.  We're currently
   looking at using the radix tree with DAX in order to reduce the number
   of calls into the filesystem.  That's leading to various enhancements
   to the radix tree, such as support for a lock bit for exceptional
   entries (Neil Brown), and support for multi-order entries (me).
   Is the (enhanced) radix tree the right data structure to be using
   for this brave new world of huge pages in the page cache, or should
   we be looking at some other data structure like an RB-tree?

 - Can we get rid of PAGE_CACHE_SIZE now?  Finally?  Pretty please?

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html