On Sat, Dec 14, 2024 at 04:22:58PM +0100, David Hildenbrand wrote: > On 14.12.24 02:39, Dan Williams wrote: > > [ add akpm and sfr for next steps ] > > > > Alistair Popple wrote: > > > Main updates since v2: > > > > > > - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX > > > and have them pass the vmf struct. > > > > > > - Seperate out the device DAX changes. > > > > > > - Restore the page share mapping counting and associated warnings. > > > > > > - Rework truncate to require file-systems to have previously called > > > dax_break_layout() to remove the address space mapping for a > > > page. This found several bugs which are fixed by the first half of > > > the series. The motivation for this was initially to allow the FS > > > DAX page-cache mappings to hold a reference on the page. > > > > > > However that turned out to be a dead-end (see the comments on patch > > > 21), but it found several bugs and I think overall it is an > > > improvement so I have left it here. > > > > > > Device and FS DAX pages have always maintained their own page > > > reference counts without following the normal rules for page reference > > > counting. In particular pages are considered free when the refcount > > > hits one rather than zero and refcounts are not added when mapping the > > > page. > > > > > > Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary > > > mechanism for allowing GUP to hold references on the page (see > > > get_dev_pagemap). However there doesn't seem to be any reason why FS > > > DAX pages need their own reference counting scheme. > > > > > > By treating the refcounts on these pages the same way as normal pages > > > we can remove a lot of special checks. In particular pXd_trans_huge() > > > becomes the same as pXd_leaf(), although I haven't made that change > > > here. It also frees up a valuable SW define PTE bit on architectures > > > that have devmap PTE bits defined. > > > > > > It also almost certainly allows further clean-up of the devmap managed > > > functions, but I have left that as a future improvment. It also > > > enables support for compound ZONE_DEVICE pages which is one of my > > > primary motivators for doing this work. > > > > So this is feeling ready for -next exposure, and ideally merged for v6.14. I > > see the comments from John and Bjorn and that you were going to respin for > > that, but if it's just those details things they can probably be handled > > incrementally. > > > > Alistair, are you ready for this to hit -next? Yeah, I'm pretty happy with the series now. It "feels" right. There's a couple of dumb build bot errors, so I was going to respin to fix those as well. I got caught up with a few other things so was just letting this sit awaiting feedback, but I should be able to post a respin early this week. > > As for which tree... > > > > Andrew, we could take this through -mm, but my first instinct would be to try > > to take it through nvdimm.git mainly to offload any conflict wrangling work and > > small fixups which are likely to be an ongoing trickle. > > > > However, I am not going to put up much of a fight if others prefer this go > > through -mm. > > > > Thoughts? > > I'm in the process of preparing v2 of [1] that will result in conflicts with > this series in the rmap code (in particular [PATCH v3 14/25] huge_memory: > Allow mappings of PUD sized pages). > > I'll be away for 2 weeks over Christmas, but I assume I'll manage to post v2 > shortly. > > Which reminds me that I still have to take a closer look at some things in > this series :) Especially also #14 regarding accounting. > > I wonder if we could split out the rmap changes in #14, and have that patch > simply in two trees? No idea. I could split out the first half (patches 1 - 8) into a series to go via nvdimm.git, because they are actually standalone clean ups that I think are worthwhile anyway. The remainder are more -mm focussed. However they do depend on the fs/dax cleanups in the first half so the trick would be making sure Andrew only takes them if the nvdimm.git changes have made it into -next. I'm happy with either approach, so let me know if I should split the series or not. - Alistair > [1] > https://lore.kernel.org/all/20240829165627.2256514-1-david@xxxxxxxxxx/T/#u > > -- > Cheers, > > David / dhildenb >