On 05/07/2015 03:11 PM, Ingo Molnar wrote: > Stable, global page-struct descriptors are a given for real RAM, where > we allocate a struct page for every page in nice, large, mostly linear > arrays. > > We'd really need that for pmem too, to get the full power of struct > page: and that means allocating them in nice, large, predictable > places - such as on the device itself ... > > It might even be 'scattered' across the device, with 64 byte struct > page size we can pack 64 descriptors into a single page, so every 65 > pages we could have a page-struct page. > > Finding a pmem page's struct page would thus involve rounding it > modulo 65 and reading that page. > > The problem with that is fourfold: > > - that we now turn a very kernel internal API and data structure into > an ABI. If struct page grows beyond 64 bytes it's a problem. > > - on bootup (or device discovery time) we'd have to initialize all > the page structs. We could probably do this in a hierarchical way, > by dividing continuous pmem ranges into power-of-two groups of > blocks, and organizing them like the buddy allocator does. > > - 1.5% of storage space lost. > > - will wear-leveling properly migrate these 'hot' pages around? MST and I have been doing some thinking about how to address some of the issues above. One way could be to invert the PG_compound logic we have today, by allocating one struct page for every PMD / THP sized area (2MB on x86), and dynamically allocating struct pages for the 4kB pages inside only if the area gets split. They can be freed again when the area is not being accessed in 4kB chunks. That way we would always look at the struct page for the 2MB area first, and if the PG_split bit is set, we look at the array of dynamically allocated struct pages for this area. The advantages are obvious: boot time memory overhead and initialization time are reduced by a factor 512. CPUs could also take a whole 2MB area in order to do CPU-local 4kB allocations, defragmentation policies may become a little clearer, etc... The disadvantage is pretty obvious too: 4kB pages would no longer be the fast case, with an indirection. I do not know how much of an issue that would be, or whether it even makes sense for 4kB pages to continue being the fast case going forward. Memory trends point in one direction, file size trends in another. For persistent memory, we would not need 4kB page struct pages unless memory from a particular area was in small files AND those files were being actively accessed. Large files (mapped in 2MB chunks) or inactive small files would not need the 4kB page structs around. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html