On Fri, Sep 17, 2021 at 01:53:33PM +0100, Christoph Hellwig wrote: > On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote: > > > That was my gut feeling. If everyone feels 100% comfortable with > > > zeroingas the mechanism to clear poisoning I'll cave in. The most > > > important bit is that we do that through a dedicated DAX path instead > > > of abusing the block layer even more. > > > > ...or just rename dax_zero_page_range() to dax_reset_page_range()? > > Where reset == "zero + clear-poison"? > > I'd say that naming is more confusing than overloading zero. How about dax_zeroinit_range() ? To go with its fallocate flag (yeah I've been too busy sorting out -rc1 regressions to repost this) FALLOC_FL_ZEROINIT_RANGE that will reset the hardware (whatever that means) and set the contents to the known value zero. Userspace usage model: void handle_media_error(int fd, loff_t pos, size_t len) { /* yell about this for posterior's sake */ ret = fallocate(fd, FALLOC_FL_ZEROINIT_RANGE, pos, len); /* yay our disk drive / pmem / stone table engraver is online */ } > > > I'm really worried about both patartitions on DAX and DM passing through > > > DAX because they deeply bind DAX to the block layer, which is just a bad > > > idea. I think we also need to sort that whole story out before removing > > > the EXPERIMENTAL tags. > > > > I do think it was a mistake to allow for DAX on partitions of a pmemX > > block-device. > > > > DAX-reflink support may be the opportunity to start deprecating that > > support. Only enable DAX-reflink for direct mounting on /dev/pmemX > > without partitions (later add dax-device direct mounting), > > I think we need to fully or almost fully sort this out. > > Here is my bold suggestions: > > 1) drop no drop the EXPERMINTAL on the current block layer overload > at all I don't understand this. > 2) add direct mounting of the nvdimm namespaces ASAP. Because all > the filesystem currently also need the /dev/pmem0 device add a way > to open the block device by the dax_device instead of our current > way of doing the reverse > 3) deprecate DAX support through block layer mounts with a say 2 year > deprecation period > 4) add DAX remapping devices as needed What devices are needed? linear for lvm, and maybe error so we can actually test all this stuff? > I'll volunteer to write the initial code for 2). And I think we should > not allow DAX+reflink on the block device shim at all. /me has other questions about daxreflink, but I'll ask them on shiyang's thread. --D