On Thu, Sep 16, 2021 at 12:12 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Wed, Sep 15, 2021 at 01:27:47PM -0700, Dan Williams wrote: > > > Yeah, Christoph suggested that we make the clearing operation explicit > > > in a related thread a few weeks ago: > > > https://lore.kernel.org/linux-fsdevel/YRtnlPERHfMZ23Tr@xxxxxxxxxxxxx/ > > > > That seemed to be tied to a proposal to plumb it all the way out to an > > explicit fallocate() mode, not make it a silent side effect of > > pwrite(). > > Yes. > > > > > > > Each of the dm drivers has to add their own ->clear_poison operation > > > that remaps the incoming (sector, len) parameters as appropriate for > > > that device and then calls the lower device's ->clear_poison with the > > > translated parameters. > > > > > > This (AFAICT) has already been done for dax_zero_page_range, so I sense > > > that Dan is trying to save you a bunch of code plumbing work by nudging > > > you towards doing s/dax_clear_poison/dax_zero_page_range/ to this series > > > and then you only need patches 2-3. > > > > Yes, but it sounds like Christoph was saying don't overload > > dax_zero_page_range(). I'd be ok splitting the difference and having a > > new fallocate clear poison mode map to dax_zero_page_range() > > internally. > > That was my gut feeling. If everyone feels 100% comfortable with > zeroingas the mechanism to clear poisoning I'll cave in. The most > important bit is that we do that through a dedicated DAX path instead > of abusing the block layer even more. ...or just rename dax_zero_page_range() to dax_reset_page_range()? Where reset == "zero + clear-poison"? > > > > BTW, our customer doesn't care about creating dax volume thru DM, so. > > > > > > They might not care, but anything going upstream should work in the > > > general case. > > > > Agree. > > I'm really worried about both patartitions on DAX and DM passing through > DAX because they deeply bind DAX to the block layer, which is just a bad > idea. I think we also need to sort that whole story out before removing > the EXPERIMENTAL tags. I do think it was a mistake to allow for DAX on partitions of a pmemX block-device. DAX-reflink support may be the opportunity to start deprecating that support. Only enable DAX-reflink for direct mounting on /dev/pmemX without partitions (later add dax-device direct mounting), change DAX-experimental warning to a deprecation notification for DAX on DM/partitions, continue to fail / never fix DAX-reflink for DM/partitions, direct people to use namespace provisioning for sub-divisions of PMEM capacity, and finally look into adding concatenation and additional software striping support to the new CXL region creation facility.