Re: [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag

Jane Chu <jane.chu@xxxxxxxxxx> · Wed, 3 Nov 2021 18:09:57 +0000

On 11/1/2021 11:18 PM, Christoph Hellwig wrote:
> On Wed, Oct 27, 2021 at 05:24:51PM -0700, Darrick J. Wong wrote:
>> ...so would you happen to know if anyone's working on solving this
>> problem for us by putting the memory controller in charge of dealing
>> with media errors?
> 
> The only one who could know is Intel..
> 
>> The trouble is, we really /do/ want to be able to (re)write the failed
>> area, and we probably want to try to read whatever we can.  Those are
>> reads and writes, not {pre,f}allocation activities.  This is where Dave
>> and I arrived at a month ago.
>>
>> Unless you'd be ok with a second IO path for recovery where we're
>> allowed to be slow?  That would probably have the same user interface
>> flag, just a different path into the pmem driver.
> 
> Which is fine with me.  If you look at the API here we do have the
> RWF_ API, which them maps to the IOMAP API, which maps to the DAX_
> API which then gets special casing over three methods.
> 
> And while Pavel pointed out that he and Jens are now optimizing for
> single branches like this.  I think this actually is silly and it is
> not my point.
> 
> The point is that the DAX in-kernel API is a mess, and before we make
> it even worse we need to sort it first.  What is directly relevant
> here is that the copy_from_iter and copy_to_iter APIs do not make
> sense.  Most of the DAX API is based around getting a memory mapping
> using ->direct_access, it is just the read/write path which is a slow
> path that actually uses this.  I have a very WIP patch series to try
> to sort this out here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dax-devirtualize
> 
> But back to this series.  The basic DAX model is that the callers gets a
> memory mapping an just works on that, maybe calling a sync after a write
> in a few cases.  So any kind of recovery really needs to be able to
> work with that model as going forward the copy_to/from_iter path will
> be used less and less.  i.e. file systems can and should use
> direct_access directly instead of using the block layer implementation
> in the pmem driver.  As an example the dm-writecache driver, the pending
> bcache nvdimm support and the (horribly and out of tree) nova file systems
> won't even use this path.  We need to find a way to support recovery
> for them.  And overloading it over the read/write path which is not
> the main path for DAX, but the absolutely fast path for 99% of the
> kernel users is a horrible idea.
> 
> So how can we work around the horrible nvdimm design for data recovery
> in a way that:
> 
>     a) actually works with the intended direct memory map use case
>     b) doesn't really affect the normal kernel too much
> 
> ?
> 

This is clearer, I've looked at your 'dax-devirtualize' patch which 
removes pmem_copy_to/from_iter, and as you mentioned before,
a separate API for poison-clearing is needed. So how about I go ahead
rebase my earlier patch

https://lore.kernel.org/lkml/20210914233132.3680546-2-jane.chu@xxxxxxxxxx/
on 'dax-devirtualize', provide dm support for clear-poison?
That way, the non-dax 99% of the pwrite use-cases aren't impacted at all
and we resolve the urgent pmem poison-clearing issue?

Dan, are you okay with this?  I am getting pressure from our customers
who are basically stuck at the moment.

thanks!
-jane

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel