On Tue, Mar 01, 2016 at 11:25:41AM +0100, Jan Kara wrote: > On Tue 01-03-16 02:09:11, Matthew Wilcox wrote: > > There are a few issues around 1GB THP support that I've come up against > > while working on DAX support that I think may be interesting to discuss > > in person. > > > > - Do we want to add support for 1GB THP for anonymous pages? DAX support > > is driving the initial 1GB THP support, but would anonymous VMAs also > > benefit from 1GB support? I'm not volunteering to do this work, but > > it might make an interesting conversation if we can identify some users > > who think performance would be better if they had 1GB THP support. > > Some time ago I was thinking about 1GB THP and I was wondering: What is the > motivation for 1GB pages for persistent memory? Is it the savings in memory > used for page tables? Or is it about the cost of fault? I think it's both. I heard from one customer who calculated that with a 6TB server, mapping every page into a process would take ~24MB of page tables. Multiply that by the 50,000 processes they expect to run on a server of that size consumes 1.2TB of DRAM. Using 1GB pages reduces that by a factor of 512, down to 2GB. Another topic to consider then would be generalising the page table sharing code that is currently specific to hugetlbfs. I didn't bring it up as I haven't researched it in any detail, and don't know how hard it would be. > For your multi-order entries I was wondering whether we shouldn't relax the > requirement that all nodes have the same number of slots - e.g. we could > have number of slots variable with node depth so that PMD and eventually PUD > multi-order slots end up being a single entry at appropriate radix tree > level. I'm not a big fan of the sibling entries either :-) One thing I do wonder is whether anyone has done performance analysis recently of whether 2^6 is the right size for radix tree nodes? If it used 2^9, this would be a perfect match to x86 page tables ;-) Variable size is a bit painful because we've got two variable size arrays in the node; the array of node pointers and the tag bitmasks. And then we lose the benefit of the slab allocator if the node size is variable. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html