On Wed, Feb 5, 2020 at 12:27 PM <jane.chu@xxxxxxxxxx> wrote: > > Hello, > > I haven't seen response to this proposal, unsure if there is a different > but related discussion ongoing... > > I'd like to express my wish: please make it easier for the pmem > applications when possible. > > If kernel does not clear poison when it could legitimately do so, The only path where this happens today is write() syscalls in dax mode, otherwise fallocate(PUNCH_HOLE) is currently the only guaranteed way to trigger error clearing from userspace (outside of sending raw commands to the device). > applications have to go through lengths to clear poisons. > For Cloud pmem applications that have upper bound on error recovery > time, not clearing poison while zeroing-out is quite undesirable. The complicating factor in all of this is the alignment requirement for clearing and the inability for native cpu instructions to clear errors. On current platforms talking to firmware is required and that interface may require 256-byte block clearing. This is why the implementation glommed on to clearing errors on block-I/O path writes because we at least knew that all of those I/Os were 512-byte aligned. This gets better with cpus that support the movdir64b instruction, in that case there is still a 64-byte alignment requirement, but there's no need to talk to the BIOS and therefore no need to talk to a driver. So we have this awkward dependency on block-device I/O semantics only because it happened to organize i/o in a way that supports error clearing. Right now the kernel does not install a pte on faults that land on a page with known poison, but only because the error clearing path is so convoluted and could only claim that fallocate(PUNCH_HOLE) cleared errors because that was guaranteed to send 512-byte aligned zero's down the block-I/O path when the fs-blocks got reallocated. In a world where native cpu instructions can clear errors the dax write() syscall case could be covered (modulo 64-byte alignment), and the kernel could just let the page be mapped so that the application could attempt it's own fine-grained clearing without calling back into the kernel.