> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > > On 27 Jul 2016, at 14:05, Trond Myklebust wrote: > >>> On Jul 27, 2016, at 12:14, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>> >>>>> On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>>>>> >>>>>> After adding more debugging, I see that all of that is working correctly, >>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 from the >>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it back >>>>>> up again. >>>>>> >>>>> >>>>> Excellent! Thanks for debugging that. >>>>> >>>>>> Now I see that we should be marking the block extents as written atomically with >>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT can >>>>>> collect extents just added from the next bl_write_cleanup(). Then, the next >>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the first >>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, but >>>>>> that's something to work on. >>>>>> >>>>>> I see ways to fix that: >>>>>> >>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to call >>>>>> ext_tree_mark_written() inside the i_lock >>>>>> >>>>>> - make another pnfs_layoutdriver_type operation to be used within >>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and call >>>>>> ext_tree_mark_written() within that.. >>>>>> >>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that would >>>>>> extend the current LAYOUTCOMMIT >>>>>> >>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>> >>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be allowed to extend the args->lastbytewritten. This is a metadata operation that is owned by the pNFS layout driver. >>>>> The only thing I’d note is you should then rewrite the failure case in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved “end_pos”, but uses args->lastbytewritten instead (with a comment to the effect why)… >>>> >>>> In fact, given the potential for races here, I think the right thing to do is to have ext_tree_prepare_commit() always set the correct value for args->lastbytewritten. >>> >>> OK, that has cleared up that common failure case that was getting in the >>> way, but now it can still fail like this: >>> >> >> Good progress! :-) >> >>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT >>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT >>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>> 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid >>> 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT >> >> Is this being tested on top of the current linux-next/testing? Normally, I’d expect http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed to cause 1st nfs_getattr() to not declare the cache valid. > > Yes, this is on your linux-next branch. > > When the 1st nfs_getattr() goes through nfs_update_inode() the second time > (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by anything, > since all the attributes returned match the cache. So even though > NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false", > the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable. > > Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR? > > Ben nfs_post_op_update_inode_locked() should be doing that as part of the callchain in nfs_writeback_update_inode(). ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥