Re: nfs_page_async_flush returning 0 for fatal errors on writeback

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 7 Jul 2021 23:50:23 +0000

On Thu, 2021-07-08 at 00:13 +0100, Calum Mackay wrote:
> On 07/07/2021 11:01 pm, Trond Myklebust wrote:
> > On Wed, 2021-07-07 at 19:51 +0100, Calum Mackay wrote:
> > > hi Trond,
> > > 
> > > I had a question about these two old commits of yours, from v5.0
> > > &
> > > v5.2:
> > > 
> > > 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal
> > > errors
> > > (2
> > > years, 2 months ago)
> > > 
> > > 8fc75bed96bb NFS: Fix up return value on fatal errors in
> > > nfs_page_async_flush() (2 years, 5 months ago)
> > > 
> > > 
> > > I am looking at a crash dump, with a kernel based on an older-
> > > still
> > > v4.14 stable which did not have either of the above commits.
> > > 
> > >          PANIC: "BUG: unable to handle kernel NULL pointer
> > > dereference
> > > at
> > > 0000000000000080"
> > > 
> > >       [exception RIP: _raw_spin_lock+20]
> > > 
> > > #10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs]
> > > #11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs]
> > > #12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b
> > > #13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs]
> > > #14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc
> > > #15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2
> > > #16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35
> > > #17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949
> > > 
> > > the real sequence, obscured by compiler inlining, is:
> > > 
> > >      nfs_updatepage
> > >         nfs_writepage_setup
> > >            nfs_setup_write_request
> > >               nfs_inode_add_request
> > >                  spin_lock(&mapping->private_lock);
> > > 
> > > and we crash since the as mapping pointer is NULL.
> > > 
> > > 
> > > I thought I was able to construct a possible sequence that would
> > > explain
> > > the above, if we are in (from above):
> > > 
> > >      nfs_setup_write_request
> > >       nfs_try_to_update_request
> > >        nfs_wb_page
> > >         nfs_writepage_locked
> > >          nfs_do_writepage
> > > 
> > > and nfs_page_async_flush detects a fatal server error, and calls
> > > nfs_write_error_remove_page, which results in the page->mapping
> > > set
> > > to NULL.
> > > 
> > > In that version of the code, without your commits above,
> > > nfs_page_async_flush returns 0 in this case, which I thought
> > > might
> > > result in nfs_setup_write_request going ahead and calling
> > > nfs_inode_add_request with that page, resulting in the crash
> > > seen.
> > > 
> > > 
> > > I then discovered your v5.0 commit:
> > > 
> > > 8fc75bed96bb NFS: Fix up return value on fatal errors in
> > > nfs_page_async_flush() (2 years, 5 months ago)
> > > 
> > > which appeared to correct that, having nfs_page_async_flush
> > > return
> > > the
> > > error in this case, so we would not end up in
> > > nfs_inode_add_request.
> > > 
> > > 
> > > But I then spotted your later v5.2 commit:
> > > 
> > > 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal
> > > errors
> > > (2
> > > years, 2 months ago)
> > > 
> > > which changes things back, so that nfs_page_async_flush now again
> > > returns 0, in the "launder" case, and that's how that code
> > > remains
> > > today.
> > > 
> > > 
> > > If so, is there anything to stop the possible crash path that I
> > > describe
> > > above?
> > > 
> > > 
> > > path I suggest above? Or perhaps I'm missing another commit that
> > > stops
> > > it happening, even after your second commit above?
> > > 
> > 
> > In order for page->mapping to get set to NULL, we'd have to be
> > removing
> > the page from the page cache altogether. I'm not seeing where we'd
> > be
> > doing that here. It certainly isn't possible for some third party
> > to do
> > so, since our thread is holding the page lock and I'm not seeing
> > where
> > the call to nfs_write_error() might be doing so.
> > 
> > We do call nfs_inode_remove_request(), which removes the struct
> > nfs_page that is tracking the page dirtiness, however it shouldn't
> > ever
> > result in the removal of the pagecache page itself.
> > 
> > Am I misreading your email?
> 
> thanks very much Trond; much more likely I am misreading the code :)
> 
> 
> My theory was that we have nfs_page_async_flush detecting 
> nfs_error_is_fatal_on_server, so calling nfs_write_error_remove_page 
> (this is an older v4.14.72-ish kernel).
> 
> That would then generic_error_remove_page -> truncate_inode_page -> 
> truncate_complete_page -> delete_from_page_cache
> 
> thus, as you say, removing the page from the page cache, with 
> __delete_from_page_cache clearing page->mapping.
> 
> 
> Without your v5.0 commit, nfs_page_async_flush will then return 0,
> via 
> nfs_do_writepage, nfs_writepage_locked, nfs_wb_page to 
> nfs_try_to_update_request, which then returns NULL to 
> nfs_setup_write_request.
> 
> nfs_inode_add_request, which itself then dereferences the mapping:
> 
>         spin_lock(&mapping->private_lock);
> 
> which is where we crash.
> 
> 
> Obviously, there are a number of assumptions in the above, so I
> thought 
> it must just be a possible path the code could take?
> 
> Does that sound plausible (given that v4.14.72-ish code)?
> 
> 
> 
> However, I note that in a subsequent v5.2 commit:
> 
> 22876f540bdf NFS: Don't call generic_error_remove_page() while
> holding locks
> 
> you remove the call to generic_error_remove_page from 
> nfs_write_error_remove_page(), and that is itself then renamed 
> nfs_write_error(), as part of a later v5.2 commit:
> 
> 6fbda89b257f NFS: Replace custom error reporting mechanism with
> generic one
> 
> 
> Without those commits, and also without your adjustments to 
> nfs_page_async_flush I mentioned earlier, is it possible that the
> code 
> path I present above, where the page /is/ removed from the page
> cache, 
> could result in the crash we saw?
> 
> 

OK, yes that is plausible. The call to generic_error_remove_page()
would, as you said, remove the page from the page cache, and thus could
result in the crash you describe.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx