On Sat, May 6, 2023 at 9:35 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > And yes, the simplest fix for the "wrong test" would be to just add a > new "out_nofolio" error case after "out_retry", and use that. > > However, even that seems wrong, because the return value for that path > is the wrong one. Actually, my suggested patch is _also_ wrong. The problem is that we do need to return VM_FAULT_RETRY to let the caller know that we released the mmap_lock. And once we return VM_FAULT_RETRY, the other error bits don't even matter. So while I think the *right* thing to do is to return VM_FAULT_OOM | VM_FAULT_RETRY, that doesn't actually end up working, because if VM_FAULT_RETRY is set, the caller will know that "yes, mmap_lock was dropped", but the callers will also just ignore the other bits and unconditionally retry. How very very annoying. This was introduced several years ago by commit 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations"). Looking at that, we have at least one other similar error case wrong too: the "page_not_uptodate" case carefully checks for IO errors and retries only if there was no error (or for the AOP_TRUNCATED_PAGE) case. For an actual IO error on page reading, it returns VM_FAULT_SIGBUS. Except - again - for that "if (fpin) goto out_retry" case, which will just return VM_FAULT_RETRY and retry the fault. I do not believe that retrying the fault is the right thing to do when we ran out of memory, or when we had an IO error, and I do not think it was intentional that the error handling was changed. But I think this is all just a mistake from how that VM_FAULT_RETRY works in the callers. How very very annoying. So scratch that patch suggestion of mine, but let's bring in some people involved with the original fpin code, and see if we can find some solution that honors that error case too. Linus