On 28 Jul 2016, at 8:31, Trond Myklebust wrote:
On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@xxxxxxxxxx>
wrote:
On 27 Jul 2016, at 14:05, Trond Myklebust wrote:
On Jul 27, 2016, at 12:14, Benjamin Coddington
<bcodding@xxxxxxxxxx> wrote:
On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
On Jul 27, 2016, at 08:15, Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
On Jul 27, 2016, at 07:55, Benjamin Coddington
<bcodding@xxxxxxxxxx> wrote:
After adding more debugging, I see that all of that is working
correctly,
but the first LAYOUTCOMMIT is taking the size back down to 4096
from the
last nfs_writeback_done(), and the second LAYOUTCOMMIT never
brings it back
up again.
Excellent! Thanks for debugging that.
Now I see that we should be marking the block extents as written
atomically with
setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a
LAYOUTCOMMIT can
collect extents just added from the next bl_write_cleanup().
Then, the next
LAYOUTCOMMIT fails, and all we're left with is the size from the
first
LAYOUTCOMMIT. Not sure if that particular problem is the whole
fix, but
that's something to work on.
I see ways to fix that:
- make a new pnfs_set_layoutcommit_locked() that can be used to
call
ext_tree_mark_written() inside the i_lock
- make another pnfs_layoutdriver_type operation to be used
within
pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?),
and call
ext_tree_mark_written() within that..
- have .prepare_layoutcommit return a new positive plh_lwb that
would
extend the current LAYOUTCOMMIT
- make ext_tree_prepare_commit only encode up to plh_lwb
I see no reason why ext_tree_prepare_commit() shouldn’t be
allowed to extend the args->lastbytewritten. This is a metadata
operation that is owned by the pNFS layout driver.
The only thing I’d note is you should then rewrite the failure
case in pnfs_layoutcommit_inode() so that it doesn’t rely on
the saved “end_pos”, but uses args->lastbytewritten instead
(with a comment to the effect why)…
In fact, given the potential for races here, I think the right
thing to do is to have ext_tree_prepare_commit() always set the
correct value for args->lastbytewritten.
OK, that has cleared up that common failure case that was getting
in the
way, but now it can still fail like this:
Good progress! :-)
nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR
set, and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears
layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR
set, and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096,
NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
1st nfs_getattr -> __revalidate_inode sets size 4096,
NFS_INO_INVALID_ATTR not set.. cache is valid
2nd nfs_getattr immediately returns 4096 even though
NFS_INO_LAYOUTCOMMIT
Is this being tested on top of the current linux-next/testing?
Normally, I’d expect
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed
to cause 1st nfs_getattr() to not declare the cache valid.
Yes, this is on your linux-next branch.
When the 1st nfs_getattr() goes through nfs_update_inode() the second
time
(during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by
anything,
since all the attributes returned match the cache. So even though
NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is
"false",
the NFS_INO_INVALID_ATTR is never set in the "invalid" local
variable.
Should pnfs_layoutcommit_outstanding() always set
NFS_INO_INVALID_ATTR?
Ben
nfs_post_op_update_inode_locked() should be doing that as part of the
callchain in nfs_writeback_update_inode().
And it is, and the bit persists through the next layoutcommit, it is the
next GETATTR response that finds that all the attributes are the same
and the bit is cleared.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html