Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
> 
> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
> 
>>> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>> 
>>> So back to Christoph's point earlier:
>>> 
>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>> This one breaks xfstests generic/207 on block/scsi layout for me.  The
>>>> reason for that is that we need a layoutcommit after writing out all
>>>> data for the file for the file size to be updated on the server.
>>> 
>>> You responded:
>>> 
>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>> I’m not understanding this argument. Why do we care if the file size is up
>>>> to date on the server if we’re not sending an actual GETATTR on the wire
>>>> to retrieve the file size?
>>> 
>>> I guess the answer might be because we can get it back from the last
>>> LAYOUTCOMMIT.
>>> 
>> 
>> The patch that I followed up with should now ensure that we do not mark the attribute cache as up to date if there is a LAYOUTCOMMIT pending.
>> IOW: when the pNFS write is done, it is expected to do 2 things:
>> 
>> 1) mark the inode for LAYOUTCOMMIT
>> 2) mark the attribute cache as invalid (because we know the change attribute, mtime, ctime need to be updates)
>> 
>> In the case of blocks pNFS write:
>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should take care of (1)
>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() should take care of (2).
>> 
>> Provided that these 2 calls are performed in the above order, then any call to nfs_getattr() which has not been preceded by a call to nfs4_proc_layoutcommit() should trigger the call to __nfs_revalidate_inode().
> 
> I think the problem is that a following nfs_getattr() will fail to notice
> the size change in the case of a write_completion and layoutcommit occuring
> after nfs_getattr() has done pnfs_sync_inode() but before it has done
> nfs_update_inode().
> 
> In the failing case there are two threads one is doing writes, the other
> doing lstat on aio_complete via io_getevents(2).
> 
> For each write completion the lstat thread tries to verify the file size.
> 
> GETATTR Thread                  LAYOUTCOMMIT Thread
> --------------                  --------------------
>                                write_completion sets LAYOUTCOMMIT (4096@0)
> --> nfs_getattr

filemap_write_and_wait()

> __nfs_revalidate_inode
>  pnfs_sync_inode

NFS_PROTO(inode)->getattr()

>  getattr sees 4096
>                                write_completion sets LAYOUTCOMMIT (4096@4096)
>                                sets LAYOUTCOMMITING
>                                clears LAYOUTCOMMIT
>                                clears LAYOUTCOMMITTING
>  nfs_refresh_inode
>   nfs_update_inode size is 4096
> <-- nfs_getattr
> 
> At this point the cached attributes are seen as up to date, but
> aio-dio-extend-stat program expects that second write_completion to reflect
> in the file size.
> 

Why isn’t the filemap_write_and_wait() above resolving the race? I’d expect that would move your “write completion sets LAYOUTCOMMIT” up to before the pnfs_sync_inode().
In fact, in the patch that Christoph sent, all he was doing was moving the pnfs_sync_inode() to immediately after that filemap_write_and_wait() instead of relying on it in _nfs_revalidate_inode.��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux