On Tue, 2018-11-06 at 06:48 -0800, Matthew Wilcox wrote: +AD4- On Tue, Nov 06, 2018 at 03:44:47AM +-0000, Williams, Dan J wrote: +AD4- +AD4- Hi Willy, +AD4- +AD4- +AD4- +AD4- I'm seeing the following warning with v4.20-rc1 and the +ACI-dax.sh+ACI- +AD4- +AD4- test +AD4- +AD4- from the ndctl repository: +AD4- +AD4- I'll try to run this myself later today. +AD4- +AD4- +AD4- I tried to get this test going on -next before the merge window, +AD4- +AD4- but +AD4- +AD4- -next was not bootable for me. Bisection points to: +AD4- +AD4- +AD4- +AD4- 9f32d221301c dax: Convert dax+AF8-lock+AF8-mapping+AF8-entry to XArray +AD4- +AD4- +AD4- +AD4- At first glance I think we need the old +ACI-always retry if we slept+ACI- +AD4- +AD4- behavior. Otherwise this failure seems similar to the issue fixed +AD4- +AD4- by +AD4- +AD4- Ross' change to always retry on any potential collision: +AD4- +AD4- +AD4- +AD4- b1f382178d15 ext4: close race between direct IO and +AD4- +AD4- ext4+AF8-break+AF8-layouts() +AD4- +AD4- +AD4- +AD4- I'll take a closer look tomorrow to see if that guess is plausible. +AD4- +AD4- If your thought is correct, then this should be all that's needed: +AD4- +AD4- diff --git a/fs/dax.c b/fs/dax.c +AD4- index 616e36ea6aaa..529ac9d7c10a 100644 +AD4- --- a/fs/dax.c +AD4- +-+-+- b/fs/dax.c +AD4- +AEAAQA- -383,11 +-383,8 +AEAAQA- bool dax+AF8-lock+AF8-mapping+AF8-entry(struct page +ACo-page) +AD4- entry +AD0- xas+AF8-load(+ACY-xas)+ADs- +AD4- if (dax+AF8-is+AF8-locked(entry)) +AHs- +AD4- entry +AD0- get+AF8-unlocked+AF8-entry(+ACY-xas)+ADs- +AD4- - /+ACo- Did the page move while we slept? +ACo-/ +AD4- - if (dax+AF8-to+AF8-pfn(entry) +ACEAPQ- page+AF8-to+AF8-pfn(page)) +AHs- +AD4- - xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- +AD4- - continue+ADs- +AD4- - +AH0- +AD4- +- xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- +AD4- +- continue+ADs- +AD4- +AH0- +AD4- dax+AF8-lock+AF8-entry(+ACY-xas, entry)+ADs- +AD4- xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- No, that doesn't work. +AD4- +AD4- I don't quite understand how we'd find a PFN for this page in the +AD4- tree +AD4- after the page has had page-+AD4-mapping removed. However, the more I +AD4- look +AD4- at this path, the more I don't like it -- it doesn't handle returning +AD4- NULL explicitly, nor does it handle the situation where a PMD is +AD4- split +AD4- to form multiple PTEs explicitly, it just kind of relies on those bit +AD4- patterns not matching. +AD4- +AD4- So I kind of like the +ACI-just retry without doing anything clever+ACI- +AD4- situation +AD4- that the above patch takes us to. I've been hacking at this today and am starting to lean towards +ACI-revert+ACI- over +ACI-fix+ACI- for the amount of changes needed to get this back on its feet. I've been able to get the test passing again with the below changes directly on top of commit 9f32d221301c +ACI-dax: Convert dax+AF8-lock+AF8-mapping+AF8-entry to XArray+ACI-. That said, I have thus far been unable to rebase this patch on top of v4.20-rc1 and yield a functional result. My concerns are: - I can't determine if dax+AF8-unlock+AF8-entry() wants an unlocked entry parameter, or locked. The dax+AF8-insert+AF8-pfn+AF8-mkwrite() and dax+AF8-unlock+AF8-mapping+AF8-entry() usages seem to disagree. - The multi-order use case of Xarray is a mystery to me. It seems to want to know the order of entries a-priori with a choice to use XA+AF8-STATE+AF8-ORDER() vs XA+AF8-STATE(). This falls over in dax+AF8-unlock+AF8-mapping+AF8-entry() and other places where the only source of the order of the entry is determined from dax+AF8-is+AF8-pmd+AF8-entry() i.e. the Xarray itself. PageHead() does not work for DAX pages because PageHead() is only established by the page allocator and DAX pages never participate in the page allocator. - The usage of rcu+AF8-read+AF8-lock() in dax+AF8-lock+AF8-mapping+AF8-entry() is needed for inode lifetime synchronization, not just for walking the radix. That lock needs to be dropped before sleeping, and if we slept the inode may no longer exist. - I could not see how the pattern: entry +AD0- xas+AF8-load(+ACY-xas)+ADs- if (dax+AF8-is+AF8-locked(entry)) +AHs- entry +AD0- get+AF8-unlocked+AF8-entry(+ACY-xas)+ADs- ...was safe given that get+AF8-unlock+AF8-entry() turns around and does validation that the entry is +ACE-xa+AF8-internal+AF8-entry() and +ACE-NULL. - The usage of internal entries in grab+AF8-mapping+AF8-entry() seems to need auditing. Previously we would compare the entry size against +AEA-size+AF8-flag, but it now if index hits a multi-order entry in get+AF8-unlocked+AF8-entry() afaics it could be internal and we need to convert it to the actual entry before aborting... at least to match the v4.19 behavior. This all seems to merit a rethink of the dax integration of Xarray. diff --git a/fs/dax.c b/fs/dax.c index fc2745ca3308..2b3bd4a4cc48 100644 --- a/fs/dax.c +-+-+- b/fs/dax.c +AEAAQA- -99,17 +-99,6 +AEAAQA- static void +ACo-dax+AF8-make+AF8-locked(unsigned long pfn, unsigned long flags) DAX+AF8-LOCKED)+ADs- +AH0- -static void +ACo-dax+AF8-make+AF8-entry(pfn+AF8-t pfn, unsigned long flags) -+AHs- - return xa+AF8-mk+AF8-value(flags +AHw- (pfn+AF8-t+AF8-to+AF8-pfn(pfn) +ADwAPA- DAX+AF8-SHIFT))+ADs- -+AH0- - -static void +ACo-dax+AF8-make+AF8-page+AF8-entry(struct page +ACo-page) -+AHs- - pfn+AF8-t pfn +AD0- page+AF8-to+AF8-pfn+AF8-t(page)+ADs- - return dax+AF8-make+AF8-entry(pfn, PageHead(page) ? DAX+AF8-PMD : 0)+ADs- -+AH0- - static bool dax+AF8-is+AF8-locked(void +ACo-entry) +AHs- return xa+AF8-to+AF8-value(entry) +ACY- DAX+AF8-LOCKED+ADs- +AEAAQA- -225,7 +-214,7 +AEAAQA- static void dax+AF8-wake+AF8-entry(struct xa+AF8-state +ACo-xas, void +ACo-entry, bool wake+AF8-all) +ACo- +ACo- Must be called with the i+AF8-pages lock held. +ACo-/ -static void +ACo-get+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas) +-static void +ACoAXwBf-get+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas, bool (+ACo-wait+AF8-fn)(void)) +AHs- void +ACo-entry+ADs- struct wait+AF8-exceptional+AF8-entry+AF8-queue ewait+ADs- +AEAAQA- -235,6 +-224,8 +AEAAQA- static void +ACo-get+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas) ewait.wait.func +AD0- wake+AF8-exceptional+AF8-entry+AF8-func+ADs- for (+ADsAOw-) +AHs- +- bool revalidate+ADs- +- entry +AD0- xas+AF8-load(xas)+ADs- if (+ACE-entry +AHwAfA- xa+AF8-is+AF8-internal(entry) +AHwAfA- WARN+AF8-ON+AF8-ONCE(+ACE-xa+AF8-is+AF8-value(entry)) +AHwAfA- +AEAAQA- -247,12 +-238,29 +AEAAQA- static void +ACo-get+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas) TASK+AF8-UNINTERRUPTIBLE)+ADs- xas+AF8-unlock+AF8-irq(xas)+ADs- xas+AF8-reset(xas)+ADs- - schedule()+ADs- +- revalidate +AD0- wait+AF8-fn()+ADs- finish+AF8-wait(wq, +ACY-ewait.wait)+ADs- xas+AF8-lock+AF8-irq(xas)+ADs- +- if (revalidate) +- return ERR+AF8-PTR(-EAGAIN)+ADs- +AH0- +AH0- +-static bool entry+AF8-wait(void) +-+AHs- +- schedule()+ADs- +- /+ACo- +- +ACo- Never return an ERR+AF8-PTR() from +AF8AXw-get+AF8-unlocked+AF8-entry(), just +- +ACo- keep looping. +- +ACo-/ +- return false+ADs- +-+AH0- +- +-static void +ACo-get+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas) +-+AHs- +- return +AF8AXw-get+AF8-unlocked+AF8-entry(xas, entry+AF8-wait)+ADs- +-+AH0- +- static void put+AF8-unlocked+AF8-entry(struct xa+AF8-state +ACo-xas, void +ACo-entry) +AHs- /+ACo- If we were the only waiter woken, wake the next one +ACo-/ +AEAAQA- -366,16 +-374,6 +AEAAQA- static void +ACoAXwBf-get+AF8-unlocked+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, +AH0- +AH0- -static bool entry+AF8-wait(void) -+AHs- - schedule()+ADs- - /+ACo- - +ACo- Never return an ERR+AF8-PTR() from - +ACo- +AF8AXw-get+AF8-unlocked+AF8-mapping+AF8-entry(), just keep looping. - +ACo-/ - return false+ADs- -+AH0- - static void +ACo-get+AF8-unlocked+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, void +ACoAKgAq-slotp) +AHs- +AEAAQA- -498,16 +-496,33 +AEAAQA- static struct page +ACo-dax+AF8-busy+AF8-page(void +ACo-entry) return NULL+ADs- +AH0- +- +-static bool entry+AF8-wait+AF8-revalidate(void) +-+AHs- +- rcu+AF8-read+AF8-unlock()+ADs- +- schedule()+ADs- +- rcu+AF8-read+AF8-lock()+ADs- +- +- /+ACo- +- +ACo- Tell +AF8AXw-get+AF8-unlocked+AF8-entry() to take a break, we need to +- +ACo- revalidate page-+AD4-mapping after dropping locks +- +ACo-/ +- return true+ADs- +-+AH0- +- bool dax+AF8-lock+AF8-mapping+AF8-entry(struct page +ACo-page) +AHs- XA+AF8-STATE(xas, NULL, 0)+ADs- +- bool did+AF8-lock +AD0- false+ADs- void +ACo-entry+ADs- +- /+ACo- hold rcu lock to coordinate with inode end-of-life +ACo-/ +- rcu+AF8-read+AF8-lock()+ADs- for (+ADsAOw-) +AHs- struct address+AF8-space +ACo-mapping +AD0- READ+AF8-ONCE(page-+AD4-mapping)+ADs- if (+ACE-dax+AF8-mapping(mapping)) - return false+ADs- +- break+ADs- /+ACo- +ACo- In the device-dax case there's no need to lock, a +AEAAQA- -516,9 +-531,12 +AEAAQA- bool dax+AF8-lock+AF8-mapping+AF8-entry(struct page +ACo-page) +ACo- otherwise we would not have a valid pfn+AF8-to+AF8-page() +ACo- translation. +ACo-/ - if (S+AF8-ISCHR(mapping-+AD4-host-+AD4-i+AF8-mode)) - return true+ADs- +- if (S+AF8-ISCHR(mapping-+AD4-host-+AD4-i+AF8-mode)) +AHs- +- did+AF8-lock +AD0- true+ADs- +- break+ADs- +- +AH0- +- xas+AF8-reset(+ACY-xas)+ADs- xas.xa +AD0- +ACY-mapping-+AD4-i+AF8-pages+ADs- xas+AF8-lock+AF8-irq(+ACY-xas)+ADs- if (mapping +ACEAPQ- page-+AD4-mapping) +AHs- +AEAAQA- -526,30 +-544,33 +AEAAQA- bool dax+AF8-lock+AF8-mapping+AF8-entry(struct page +ACo-page) continue+ADs- +AH0- xas+AF8-set(+ACY-xas, page-+AD4-index)+ADs- - entry +AD0- xas+AF8-load(+ACY-xas)+ADs- - if (dax+AF8-is+AF8-locked(entry)) +AHs- - entry +AD0- get+AF8-unlocked+AF8-entry(+ACY-xas)+ADs- - /+ACo- Did the page move while we slept? +ACo-/ - if (dax+AF8-to+AF8-pfn(entry) +ACEAPQ- page+AF8-to+AF8-pfn(page)) +AHs- - xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- - continue+ADs- - +AH0- +- entry +AD0- +AF8AXw-get+AF8-unlocked+AF8-entry(+ACY-xas, entry+AF8-wait+AF8-revalidate)+ADs- +- if (+ACE-entry) +AHs- +- xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- +- break+ADs- +- +AH0- else if (IS+AF8-ERR(entry)) +AHs- +- xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- +- WARN+AF8-ON+AF8-ONCE(PTR+AF8-ERR(entry) +ACEAPQ- -EAGAIN)+ADs- +- continue+ADs- +AH0- dax+AF8-lock+AF8-entry(+ACY-xas, entry)+ADs- +- did+AF8-lock +AD0- true+ADs- xas+AF8-unlock+AF8-irq(+ACY-xas)+ADs- - return true+ADs- +- break+ADs- +AH0- +- rcu+AF8-read+AF8-unlock()+ADs- +- +- return did+AF8-lock+ADs- +AH0- void dax+AF8-unlock+AF8-mapping+AF8-entry(struct page +ACo-page) +AHs- struct address+AF8-space +ACo-mapping +AD0- page-+AD4-mapping+ADs- - XA+AF8-STATE(xas, +ACY-mapping-+AD4-i+AF8-pages, page-+AD4-index)+ADs- if (S+AF8-ISCHR(mapping-+AD4-host-+AD4-i+AF8-mode)) return+ADs- - dax+AF8-unlock+AF8-entry(+ACY-xas, dax+AF8-make+AF8-page+AF8-entry(page))+ADs- +- unlock+AF8-mapping+AF8-entry(mapping, page-+AD4-index)+ADs- +AH0- /+ACo-