hi Trond, I had a question about these two old commits of yours, from v5.0 & v5.2:14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 years, 2 months ago)
8fc75bed96bb NFS: Fix up return value on fatal errors in nfs_page_async_flush() (2 years, 5 months ago)
I am looking at a crash dump, with a kernel based on an older-still v4.14 stable which did not have either of the above commits.
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000080" [exception RIP: _raw_spin_lock+20] #10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs] #11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs] #12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b #13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs] #14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc #15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2 #16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35 #17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949 the real sequence, obscured by compiler inlining, is: nfs_updatepage nfs_writepage_setup nfs_setup_write_request nfs_inode_add_request spin_lock(&mapping->private_lock); and we crash since the as mapping pointer is NULL.I thought I was able to construct a possible sequence that would explain the above, if we are in (from above):
nfs_setup_write_request nfs_try_to_update_request nfs_wb_page nfs_writepage_locked nfs_do_writepageand nfs_page_async_flush detects a fatal server error, and calls nfs_write_error_remove_page, which results in the page->mapping set to NULL.
In that version of the code, without your commits above, nfs_page_async_flush returns 0 in this case, which I thought might result in nfs_setup_write_request going ahead and calling nfs_inode_add_request with that page, resulting in the crash seen.
I then discovered your v5.0 commit:8fc75bed96bb NFS: Fix up return value on fatal errors in nfs_page_async_flush() (2 years, 5 months ago)
which appeared to correct that, having nfs_page_async_flush return the error in this case, so we would not end up in nfs_inode_add_request.
But I then spotted your later v5.2 commit:14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 years, 2 months ago)
which changes things back, so that nfs_page_async_flush now again returns 0, in the "launder" case, and that's how that code remains today.
If so, is there anything to stop the possible crash path that I describe above?
Is my theory just wrong? Is there another mechanism that prevents the path I suggest above? Or perhaps I'm missing another commit that stops it happening, even after your second commit above?
thanks very much, cheers, calum.
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature