On Thu, Jan 23, 2020 at 11:07 AM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > On Thu, Jan 23, 2020 at 11:52:49AM -0500, Vivek Goyal wrote: > > Hi, > > > > This is an RFC patch to provide a dax operation to zero a range of memory. > > It will also clear poison in the process. This is primarily compile tested > > patch. I don't have real hardware to test the poison logic. I am posting > > this to figure out if this is the right direction or not. > > > > Motivation from this patch comes from Christoph's feedback that he will > > rather prefer a dax way to zero a range instead of relying on having to > > call blkdev_issue_zeroout() in __dax_zero_page_range(). > > > > https://lkml.org/lkml/2019/8/26/361 > > > > My motivation for this change is virtiofs DAX support. There we use DAX > > but we don't have a block device. So any dax code which has the assumption > > that there is always a block device associated is a problem. So this > > is more of a cleanup of one of the places where dax has this dependency > > on block device and if we add a dax operation for zeroing a range, it > > can help with not having to call blkdev_issue_zeroout() in dax path. > > > > I have yet to take care of stacked block drivers (dm/md). > > > > Current poison clearing logic is primarily written with assumption that > > I/O is sector aligned. With this new method, this assumption is broken > > and one can pass any range of memory to zero. I have fixed few places > > in existing logic to be able to handle an arbitrary start/end. I am > > not sure are there other dependencies which might need fixing or > > prohibit us from providing this method. > > > > Any feedback or comment is welcome. > > So who gest to use this? :) > > Should we (XFS) make fallocate(ZERO_RANGE) detect when it's operating on > a written extent in a DAX file and call this instead of what it does now > (punch range and reallocate unwritten)? If it eliminates more block assumptions, then yes. In general I think there are opportunities to use "native" direct_access instead of block-i/o for other areas too, like metadata i/o. > Is this the kind of thing XFS should just do on its own when DAX us that > some range of pmem has gone bad and now we need to (a) race with the > userland programs to write /something/ to the range to prevent a machine > check (b) whack all the programs that think they have a mapping to > their data, (c) see if we have a DRAM copy and just write that back, (d) > set wb_err so fsyncs fail, and/or (e) regenerate metadata as necessary? (a), (b) duplicate what memory error handling already does. So yes, could be done but it only helps if machine check handling is broken or missing. (c) what DRAM copy in the DAX case? (d) dax fsync is just cache flush, so it can't fail, or are you talking about errors in metadata? (e) I thought our solution for dax metadata redundancy is to use a realtime data device and raid mirror for the metadata device. > <cough> Will XFS ever get that "your storage went bad" hook that was > promised ages ago? pmem developers don't scale? > Though I guess it only does this a single page at a time, which won't be > awesome if we're trying to zero (say) 100GB of pmem. I was expecting to > see one big memset() call to zero the entire range followed by > pmem_clear_poison() on the entire range, but I guess you did tag this > RFC. :) Until movdir64b is available the only way to clear poison is by making a call to the BIOS. The BIOS may not be efficient at bulk clearing.