On Thu, 2021-07-08 at 00:13 +0100, Calum Mackay wrote: > On 07/07/2021 11:01 pm, Trond Myklebust wrote: > > On Wed, 2021-07-07 at 19:51 +0100, Calum Mackay wrote: > > > hi Trond, > > > > > > I had a question about these two old commits of yours, from v5.0 > > > & > > > v5.2: > > > > > > 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal > > > errors > > > (2 > > > years, 2 months ago) > > > > > > 8fc75bed96bb NFS: Fix up return value on fatal errors in > > > nfs_page_async_flush() (2 years, 5 months ago) > > > > > > > > > I am looking at a crash dump, with a kernel based on an older- > > > still > > > v4.14 stable which did not have either of the above commits. > > > > > > PANIC: "BUG: unable to handle kernel NULL pointer > > > dereference > > > at > > > 0000000000000080" > > > > > > [exception RIP: _raw_spin_lock+20] > > > > > > #10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs] > > > #11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs] > > > #12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b > > > #13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs] > > > #14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc > > > #15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2 > > > #16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35 > > > #17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949 > > > > > > the real sequence, obscured by compiler inlining, is: > > > > > > nfs_updatepage > > > nfs_writepage_setup > > > nfs_setup_write_request > > > nfs_inode_add_request > > > spin_lock(&mapping->private_lock); > > > > > > and we crash since the as mapping pointer is NULL. > > > > > > > > > I thought I was able to construct a possible sequence that would > > > explain > > > the above, if we are in (from above): > > > > > > nfs_setup_write_request > > > nfs_try_to_update_request > > > nfs_wb_page > > > nfs_writepage_locked > > > nfs_do_writepage > > > > > > and nfs_page_async_flush detects a fatal server error, and calls > > > nfs_write_error_remove_page, which results in the page->mapping > > > set > > > to NULL. > > > > > > In that version of the code, without your commits above, > > > nfs_page_async_flush returns 0 in this case, which I thought > > > might > > > result in nfs_setup_write_request going ahead and calling > > > nfs_inode_add_request with that page, resulting in the crash > > > seen. > > > > > > > > > I then discovered your v5.0 commit: > > > > > > 8fc75bed96bb NFS: Fix up return value on fatal errors in > > > nfs_page_async_flush() (2 years, 5 months ago) > > > > > > which appeared to correct that, having nfs_page_async_flush > > > return > > > the > > > error in this case, so we would not end up in > > > nfs_inode_add_request. > > > > > > > > > But I then spotted your later v5.2 commit: > > > > > > 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal > > > errors > > > (2 > > > years, 2 months ago) > > > > > > which changes things back, so that nfs_page_async_flush now again > > > returns 0, in the "launder" case, and that's how that code > > > remains > > > today. > > > > > > > > > If so, is there anything to stop the possible crash path that I > > > describe > > > above? > > > > > > > > > path I suggest above? Or perhaps I'm missing another commit that > > > stops > > > it happening, even after your second commit above? > > > > > > > In order for page->mapping to get set to NULL, we'd have to be > > removing > > the page from the page cache altogether. I'm not seeing where we'd > > be > > doing that here. It certainly isn't possible for some third party > > to do > > so, since our thread is holding the page lock and I'm not seeing > > where > > the call to nfs_write_error() might be doing so. > > > > We do call nfs_inode_remove_request(), which removes the struct > > nfs_page that is tracking the page dirtiness, however it shouldn't > > ever > > result in the removal of the pagecache page itself. > > > > Am I misreading your email? > > thanks very much Trond; much more likely I am misreading the code :) > > > My theory was that we have nfs_page_async_flush detecting > nfs_error_is_fatal_on_server, so calling nfs_write_error_remove_page > (this is an older v4.14.72-ish kernel). > > That would then generic_error_remove_page -> truncate_inode_page -> > truncate_complete_page -> delete_from_page_cache > > thus, as you say, removing the page from the page cache, with > __delete_from_page_cache clearing page->mapping. > > > Without your v5.0 commit, nfs_page_async_flush will then return 0, > via > nfs_do_writepage, nfs_writepage_locked, nfs_wb_page to > nfs_try_to_update_request, which then returns NULL to > nfs_setup_write_request. > > nfs_inode_add_request, which itself then dereferences the mapping: > > spin_lock(&mapping->private_lock); > > which is where we crash. > > > Obviously, there are a number of assumptions in the above, so I > thought > it must just be a possible path the code could take? > > Does that sound plausible (given that v4.14.72-ish code)? > > > > However, I note that in a subsequent v5.2 commit: > > 22876f540bdf NFS: Don't call generic_error_remove_page() while > holding locks > > you remove the call to generic_error_remove_page from > nfs_write_error_remove_page(), and that is itself then renamed > nfs_write_error(), as part of a later v5.2 commit: > > 6fbda89b257f NFS: Replace custom error reporting mechanism with > generic one > > > Without those commits, and also without your adjustments to > nfs_page_async_flush I mentioned earlier, is it possible that the > code > path I present above, where the page /is/ removed from the page > cache, > could result in the crash we saw? > > OK, yes that is plausible. The call to generic_error_remove_page() would, as you said, remove the page from the page cache, and thus could result in the crash you describe. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx