On Tue, Nov 06, 2018 at 03:44:47AM +0000, Williams, Dan J wrote: > Hi Willy, > > I'm seeing the following warning with v4.20-rc1 and the "dax.sh" test > from the ndctl repository: I'll try to run this myself later today. > I tried to get this test going on -next before the merge window, but > -next was not bootable for me. Bisection points to: > > 9f32d221301c dax: Convert dax_lock_mapping_entry to XArray > > At first glance I think we need the old "always retry if we slept" > behavior. Otherwise this failure seems similar to the issue fixed by > Ross' change to always retry on any potential collision: > > b1f382178d15 ext4: close race between direct IO and ext4_break_layouts() > > I'll take a closer look tomorrow to see if that guess is plausible. If your thought is correct, then this should be all that's needed: diff --git a/fs/dax.c b/fs/dax.c index 616e36ea6aaa..529ac9d7c10a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -383,11 +383,8 @@ bool dax_lock_mapping_entry(struct page *page) entry = xas_load(&xas); if (dax_is_locked(entry)) { entry = get_unlocked_entry(&xas); - /* Did the page move while we slept? */ - if (dax_to_pfn(entry) != page_to_pfn(page)) { - xas_unlock_irq(&xas); - continue; - } + xas_unlock_irq(&xas); + continue; } dax_lock_entry(&xas, entry); xas_unlock_irq(&xas); I don't quite understand how we'd find a PFN for this page in the tree after the page has had page->mapping removed. However, the more I look at this path, the more I don't like it -- it doesn't handle returning NULL explicitly, nor does it handle the situation where a PMD is split to form multiple PTEs explicitly, it just kind of relies on those bit patterns not matching. So I kind of like the "just retry without doing anything clever" situation that the above patch takes us to.