Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28 Jul 2016, at 8:31, Trond Myklebust wrote:

On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:


On 27 Jul 2016, at 14:05, Trond Myklebust wrote:

On Jul 27, 2016, at 12:14, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

On 27 Jul 2016, at 8:31, Trond Myklebust wrote:

On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:


On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

After adding more debugging, I see that all of that is working correctly, but the first LAYOUTCOMMIT is taking the size back down to 4096 from the last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it back
up again.


Excellent! Thanks for debugging that.

Now I see that we should be marking the block extents as written atomically with setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT can collect extents just added from the next bl_write_cleanup(). Then, the next LAYOUTCOMMIT fails, and all we're left with is the size from the first LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, but
that's something to work on.

I see ways to fix that:

- make a new pnfs_set_layoutcommit_locked() that can be used to call
  ext_tree_mark_written() inside the i_lock

- make another pnfs_layoutdriver_type operation to be used within pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and call
  ext_tree_mark_written() within that..

- have .prepare_layoutcommit return a new positive plh_lwb that would
  extend the current LAYOUTCOMMIT

- make ext_tree_prepare_commit only encode up to plh_lwb

I see no reason why ext_tree_prepare_commit() shouldn’t be allowed to extend the args->lastbytewritten. This is a metadata operation that is owned by the pNFS layout driver. The only thing I’d note is you should then rewrite the failure case in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved “end_pos”, but uses args->lastbytewritten instead (with a comment to the effect why)…

In fact, given the potential for races here, I think the right thing to do is to have ext_tree_prepare_commit() always set the correct value for args->lastbytewritten.

OK, that has cleared up that common failure case that was getting in the
way, but now it can still fail like this:


Good progress! :-)

nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT

Is this being tested on top of the current linux-next/testing? Normally, I’d expect http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed to cause 1st nfs_getattr() to not declare the cache valid.

Yes, this is on your linux-next branch.

When the 1st nfs_getattr() goes through nfs_update_inode() the second time (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by anything,
since all the attributes returned match the cache.  So even though
NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false", the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable.

Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR?

Ben

nfs_post_op_update_inode_locked() should be doing that as part of the callchain in nfs_writeback_update_inode().

And it is, and the bit persists through the next layoutcommit, it is the next GETATTR response that finds that all the attributes are the same and the bit is cleared.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux