On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
wrote:
On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@xxxxxxxxxx>
wrote:
After adding more debugging, I see that all of that is working
correctly,
but the first LAYOUTCOMMIT is taking the size back down to 4096 from
the
last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings
it back
up again.
Excellent! Thanks for debugging that.
Now I see that we should be marking the block extents as written
atomically with
setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a
LAYOUTCOMMIT can
collect extents just added from the next bl_write_cleanup(). Then,
the next
LAYOUTCOMMIT fails, and all we're left with is the size from the
first
LAYOUTCOMMIT. Not sure if that particular problem is the whole fix,
but
that's something to work on.
I see ways to fix that:
- make a new pnfs_set_layoutcommit_locked() that can be used to
call
ext_tree_mark_written() inside the i_lock
- make another pnfs_layoutdriver_type operation to be used within
pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?),
and call
ext_tree_mark_written() within that..
- have .prepare_layoutcommit return a new positive plh_lwb that
would
extend the current LAYOUTCOMMIT
- make ext_tree_prepare_commit only encode up to plh_lwb
I see no reason why ext_tree_prepare_commit() shouldn’t be allowed
to extend the args->lastbytewritten. This is a metadata operation
that is owned by the pNFS layout driver.
The only thing I’d note is you should then rewrite the failure case
in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved
“end_pos”, but uses args->lastbytewritten instead (with a comment
to the effect why)…
In fact, given the potential for races here, I think the right thing
to do is to have ext_tree_prepare_commit() always set the correct
value for args->lastbytewritten.
OK, that has cleared up that common failure case that was getting in the
way, but now it can still fail like this:
nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set,
and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit
flag sets NFS_INO_LAYOUTCOMMITING
nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set,
and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096,
NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
1st nfs_getattr -> __revalidate_inode sets size 4096,
NFS_INO_INVALID_ATTR not set.. cache is valid
2nd nfs_getattr immediately returns 4096 even though
NFS_INO_LAYOUTCOMMIT
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html