nfs_page_async_flush returning 0 for fatal errors on writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Trond,

I had a question about these two old commits of yours, from v5.0 & v5.2:

14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 years, 2 months ago)

8fc75bed96bb NFS: Fix up return value on fatal errors in nfs_page_async_flush() (2 years, 5 months ago)


I am looking at a crash dump, with a kernel based on an older-still v4.14 stable which did not have either of the above commits.

       PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0000000000000080"

    [exception RIP: _raw_spin_lock+20]

#10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs]
#11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs]
#12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b
#13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs]
#14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc
#15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2
#16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35
#17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949

the real sequence, obscured by compiler inlining, is:

   nfs_updatepage
      nfs_writepage_setup
         nfs_setup_write_request
            nfs_inode_add_request
               spin_lock(&mapping->private_lock);

and we crash since the as mapping pointer is NULL.


I thought I was able to construct a possible sequence that would explain the above, if we are in (from above):

   nfs_setup_write_request
    nfs_try_to_update_request
     nfs_wb_page
      nfs_writepage_locked
       nfs_do_writepage

and nfs_page_async_flush detects a fatal server error, and calls nfs_write_error_remove_page, which results in the page->mapping set to NULL.

In that version of the code, without your commits above, nfs_page_async_flush returns 0 in this case, which I thought might result in nfs_setup_write_request going ahead and calling nfs_inode_add_request with that page, resulting in the crash seen.


I then discovered your v5.0 commit:

8fc75bed96bb NFS: Fix up return value on fatal errors in nfs_page_async_flush() (2 years, 5 months ago)

which appeared to correct that, having nfs_page_async_flush return the error in this case, so we would not end up in nfs_inode_add_request.


But I then spotted your later v5.2 commit:

14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 years, 2 months ago)

which changes things back, so that nfs_page_async_flush now again returns 0, in the "launder" case, and that's how that code remains today.


If so, is there anything to stop the possible crash path that I describe above?


Is my theory just wrong? Is there another mechanism that prevents the path I suggest above? Or perhaps I'm missing another commit that stops it happening, even after your second commit above?


thanks very much,

cheers,
calum.


Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux