On Tue, Nov 13, 2018 at 6:25 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Sat, Nov 10, 2018 at 09:08:10AM -0800, Dan Williams wrote: > > On Sat, Nov 10, 2018 at 12:29 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: [..] > > > If we get an internal entry in this case, we know we were looking up > > > a PMD entry and found a PTE entry. > > > > Oh, so I may have my understanding of internal entries backwards? I.e. > > I thought they were returned if you have an order-0 xas and passed > > xas_load() an unaligned index, but the entry is multi-order. You're > > saying they are only returned when we have a multi-order xas and > > xas_load() finds an order-0 entry at the unaligned index. So > > "internal" isn't Xarray private state it's an order-0 entry when the > > user wanted multi-order? > > This sounds much more like what I just re-described above. When you say > an unaligned index, I suspect you mean something like having a PMD entry > and specifying an index which is not PMD-aligned? That always returns > the PMD entry, just like the radix tree used to. ...ok, so I think I may have evidence to the contrary or something else is going wrong in the api. At the very least we're inching towards root cause. If I modify dax_to_pfn() like so: diff --git a/fs/dax.c b/fs/dax.c index 3f592dc18d67..7fd3529fe859 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -88,9 +88,17 @@ fs_initcall(init_dax_wait_table); #define DAX_ZERO_PAGE (1UL << 2) #define DAX_EMPTY (1UL << 3) -static unsigned long dax_to_pfn(void *entry) +static unsigned long dax_is_pmd_entry(void *entry) { - return xa_to_value(entry) >> DAX_SHIFT; + return xa_to_value(entry) & DAX_PMD; +} + +static noinline unsigned long dax_to_pfn(void *entry) +{ + unsigned long val = xa_to_value(entry) >> DAX_SHIFT; + + WARN_ON_ONCE(dax_is_pmd_entry(entry) && val & ((1UL << PMD_ORDER) - 1)); + return val; } ...it triggers. The same change on top 4.19 does not. So somehow we are able to lookup a pmd entry, but the value of the entry is pte aligned. This is the precursor to the original failure because ext4 tries to invalidate the page with the memory failure, but the resulting dax_disassociate_entry() starts its for_each_mapped_pfn() at an unaligned pfn, and likely spills over into unrelated pages. Later dax_insert_entry() sees ->mapping still set for the first few pfns relative to the original pmd entry when it should have been set to NULL. Do you see anything in __dax_invalidate_entry() path that would lead to this or do I need to peel the onion a bit more?