Re: [RFC][PATCH] dax: Do not try to clear poison for partial pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/18/20 11:50 AM, Jeff Moyer wrote:
Dan Williams <dan.j.williams@xxxxxxxxx> writes:

Right now the kernel does not install a pte on faults that land on a
page with known poison, but only because the error clearing path is so
convoluted and could only claim that fallocate(PUNCH_HOLE) cleared
errors because that was guaranteed to send 512-byte aligned zero's
down the block-I/O path when the fs-blocks got reallocated. In a world
where native cpu instructions can clear errors the dax write() syscall
case could be covered (modulo 64-byte alignment), and the kernel could
just let the page be mapped so that the application could attempt it's
own fine-grained clearing without calling back into the kernel.

I'm not sure we'd want to do allow mapping the PTEs even if there was
support for clearing errors via CPU instructions.  Any load from a
poisoned page will result in an MCE, and there exists the possiblity
that you will hit an unrecoverable error (Processor Context Corrupt).
It's just safer to catch these cases by not mapping the page, and
forcing recovery through the driver.

-Jeff


I'm still in the process of trying a number of things before making an
attempt to respond to Dan's response. But I'm too slow, so I'd like
to share some concerns I have here.

If a poison in a file is consumed, and the signal handle does the
repair and recover as follow: punch a hole the size at least 4K, then
pwrite the correct data in to the 'hole', then resume the operation.
However, because the newly allocated pmem block (due to pwrite to the 'hole') is a different clean physical pmem block while the poisoned
block remain unfixed, so we have a provisioning problem, because
 1. DCPMEM is expensive hence there is likely little provision being
provided by users;
 2. lack up API between dax-filesystem and pmem driver for clearing
poison at each legitimate point, such as when the filesystem tries
to allocate a pmem block, or zeroing out a range.

As DCPMM is used for its performance and capacity in cloud application,
which translates to that the performance code paths include the error
handling and recovery code path...

With respect to the new cpu instruction, my concern is about the API including the error blast radius as reported in the signal payload.
Is there a venue where we could discuss more in detail ?

Regards,
-jane






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux