Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27 Jul 2016, at 8:31, Trond Myklebust wrote:

On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:


On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

After adding more debugging, I see that all of that is working correctly, but the first LAYOUTCOMMIT is taking the size back down to 4096 from the last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it back
up again.


Excellent! Thanks for debugging that.

Now I see that we should be marking the block extents as written atomically with setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT can collect extents just added from the next bl_write_cleanup(). Then, the next LAYOUTCOMMIT fails, and all we're left with is the size from the first LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, but
that's something to work on.

I see ways to fix that:

- make a new pnfs_set_layoutcommit_locked() that can be used to call
    ext_tree_mark_written() inside the i_lock

  - make another pnfs_layoutdriver_type operation to be used within
pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and call
    ext_tree_mark_written() within that..

- have .prepare_layoutcommit return a new positive plh_lwb that would
    extend the current LAYOUTCOMMIT

  - make ext_tree_prepare_commit only encode up to plh_lwb

I see no reason why ext_tree_prepare_commit() shouldn’t be allowed to extend the args->lastbytewritten. This is a metadata operation that is owned by the pNFS layout driver. The only thing I’d note is you should then rewrite the failure case in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved “end_pos”, but uses args->lastbytewritten instead (with a comment to the effect why)…

In fact, given the potential for races here, I think the right thing to do is to have ext_tree_prepare_commit() always set the correct value for args->lastbytewritten.

OK, that has cleared up that common failure case that was getting in the
way, but now it can still fail like this:

nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux