> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > On 21 Jul 2016, at 9:20, Trond Myklebust wrote: > >>> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> So back to Christoph's point earlier: >>> >>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>> This one breaks xfstests generic/207 on block/scsi layout for me. The >>>> reason for that is that we need a layoutcommit after writing out all >>>> data for the file for the file size to be updated on the server. >>> >>> You responded: >>> >>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>> I’m not understanding this argument. Why do we care if the file size is up >>>> to date on the server if we’re not sending an actual GETATTR on the wire >>>> to retrieve the file size? >>> >>> I guess the answer might be because we can get it back from the last >>> LAYOUTCOMMIT. >>> >> >> The patch that I followed up with should now ensure that we do not mark the attribute cache as up to date if there is a LAYOUTCOMMIT pending. >> IOW: when the pNFS write is done, it is expected to do 2 things: >> >> 1) mark the inode for LAYOUTCOMMIT >> 2) mark the attribute cache as invalid (because we know the change attribute, mtime, ctime need to be updates) >> >> In the case of blocks pNFS write: >> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should take care of (1) >> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() should take care of (2). >> >> Provided that these 2 calls are performed in the above order, then any call to nfs_getattr() which has not been preceded by a call to nfs4_proc_layoutcommit() should trigger the call to __nfs_revalidate_inode(). > > I think the problem is that a following nfs_getattr() will fail to notice > the size change in the case of a write_completion and layoutcommit occuring > after nfs_getattr() has done pnfs_sync_inode() but before it has done > nfs_update_inode(). > > In the failing case there are two threads one is doing writes, the other > doing lstat on aio_complete via io_getevents(2). > > For each write completion the lstat thread tries to verify the file size. > > GETATTR Thread LAYOUTCOMMIT Thread > -------------- -------------------- > write_completion sets LAYOUTCOMMIT (4096@0) > --> nfs_getattr filemap_write_and_wait() > __nfs_revalidate_inode > pnfs_sync_inode NFS_PROTO(inode)->getattr() > getattr sees 4096 > write_completion sets LAYOUTCOMMIT (4096@4096) > sets LAYOUTCOMMITING > clears LAYOUTCOMMIT > clears LAYOUTCOMMITTING > nfs_refresh_inode > nfs_update_inode size is 4096 > <-- nfs_getattr > > At this point the cached attributes are seen as up to date, but > aio-dio-extend-stat program expects that second write_completion to reflect > in the file size. > Why isn’t the filemap_write_and_wait() above resolving the race? I’d expect that would move your “write completion sets LAYOUTCOMMIT” up to before the pnfs_sync_inode(). In fact, in the patch that Christoph sent, all he was doing was moving the pnfs_sync_inode() to immediately after that filemap_write_and_wait() instead of relying on it in _nfs_revalidate_inode.��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥