On Tue, Jan 19, 2016 at 09:25:25AM -0500, Matthew Wilcox wrote: > From: Matthew Wilcox <willy@xxxxxxxxxxxxxxx> > > In order to support huge pages in the page cache, Kirill has proposed > simply creating 512 entries. I think this runs into problems with > fsync() tracking dirty bits in the radix tree. Ross inserts a special > entry to represent the PMD at the index for the start of the PMD, but > this requires probing the tree twice; once for the PTE and once for the PMD. > When we add PUD entries, that will become three times. > > The approach in this patch set is to modify the radix tree to support > multi-order entries. Pointers to internal radix tree nodes mostly do not > have the 'indirect' bit set. I change that so they always have that bit > set; then any pointer without the indirect bit set is a multi-order entry. > > If the order of the entry is a multiple of the fanout of the tree, > then all is well. If not, it is necessary to insert alias nodes into > the tree that point to the canonical entry. At this point, I have not > added support for entries which are smaller than the last-level fanout of > the tree (and I put a BUG_ON in to prevent that usage). Adding support > would be a simple matter of one last pointer-chase when we get to the > bottom of the tree, but I am not aware of any reason to add support for > smaller multi-order entries at this point, so I haven't. > > Note that no actual users are modified at this point. I think it'd be > mostly a matter of deleting code from the DAX fsync support at this point, > but with that code in flux, I'm a little reluctant to add more churn > to it. I'm also not entriely sure where Kirill is on the page-cache > modifications; he seems to have his hands full fixing up the MM right now. > > Before diving into the important modifications, I add Andrew Morton's > radix tree test harness to the tree in patches 1 & 2. It was absolutely > invaluable in catching some of my bugs. Patches 3 & 4 are minor tweaks. > Patches 5-7 are the interesting ones. Patch 8 we might want to leave > out entirely or shift over to the test harness. I found it useful during > debugging and others might too. > > Matthew Wilcox (8): > radix-tree: Add an explicit include of bitops.h > radix tree test harness > radix-tree: Cleanups > radix_tree: Convert some variables to unsigned types > radix_tree: Tag all internal tree nodes as indirect pointers > radix_tree: Loop based on shift count, not height > radix_tree: Add support for multi-order entries > radix_tree: Add radix_tree_dump I like the idea of this approach - I'll work on integrating it into DAX *sync. One quick note - some of the patches are prefixed with "radix-tree" and others with "radix_tree". Also, if we go through the trouble of including the radix tree test harness, should we include a new test at the end of the series that tests out multi-order radix tree entries? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html