Re: [RFC] Make the memory failure blast radius more precise

Dan Williams <dan.j.williams@xxxxxxxxx> · Tue, 23 Jun 2020 14:48:12 -0700

On Tue, Jun 23, 2020 at 1:18 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
>
> Hardware actually tells us the blast radius of the error, but we ignore
> it and take out the entire page.  We've had a customer request to know
> exactly how much of the page is damaged so they can avoid reconstructing
> an entire 2MB page if only a single cacheline is damaged.
>
> This is only a strawman that I did in an hour or two; I'd appreciate
> architectural-level feedback.  Should I just convert memory_failure() to
> always take an address & granularity?  Should I create a struct to pass
> around (page, phys, granularity) instead of reconstructing the missing
> pieces in half a dozen functions?  Is this functionality welcome at all,
> or is the risk of upsetting applications which expect at least a page
> of granularity too high?
>
> I can see places where I've specified a plain PAGE_SHIFT insted of
> interrogating a compound page for its size.  I'd probably split this
> patch up into two or three pieces for applying.
>
> I've also blindly taken out the call to unmap_mapping_range().  Again,
> the customer requested that we not do this.  That deserves to be in its
> own patch and properly justified.

I had been thinking that we could not do much with the legacy
memory-failure reporting model and that applications that want a new
model would need to opt-into it. This topic also dovetails with what
Dave and I had been discussing in terms coordinating memory error
handling with the filesystem which may have more information about
multiple mappings of a DAX page (reflink) [1].

[1]: http://lore.kernel.org/r/20200311063942.GE10776@xxxxxxxxxxxxxxxxxxx