Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
> 
> 
> 
> On 25 Jul 2016, at 12:39, Trond Myklebust wrote:
> 
>>> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>> 
>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>> 
>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>>> 
>>>>> So back to Christoph's point earlier:
>>>>> 
>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>>> This one breaks xfstests generic/207 on block/scsi layout for me.  The
>>>>>> reason for that is that we need a layoutcommit after writing out all
>>>>>> data for the file for the file size to be updated on the server.
>>>>> 
>>>>> You responded:
>>>>> 
>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>>> I’m not understanding this argument. Why do we care if the file size is up
>>>>>> to date on the server if we’re not sending an actual GETATTR on the wire
>>>>>> to retrieve the file size?
>>>>> 
>>>>> I guess the answer might be because we can get it back from the last
>>>>> LAYOUTCOMMIT.
>>>>> 
>>>> 
>>>> The patch that I followed up with should now ensure that we do not mark the attribute cache as up to date if there is a LAYOUTCOMMIT pending.
>>>> IOW: when the pNFS write is done, it is expected to do 2 things:
>>>> 
>>>> 1) mark the inode for LAYOUTCOMMIT
>>>> 2) mark the attribute cache as invalid (because we know the change attribute, mtime, ctime need to be updates)
>>>> 
>>>> In the case of blocks pNFS write:
>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should take care of (1)
>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() should take care of (2).
>>>> 
>>>> Provided that these 2 calls are performed in the above order, then any call to nfs_getattr() which has not been preceded by a call to nfs4_proc_layoutcommit() should trigger the call to __nfs_revalidate_inode().
>>> 
>>> I think the problem is that a following nfs_getattr() will fail to notice
>>> the size change in the case of a write_completion and layoutcommit occuring
>>> after nfs_getattr() has done pnfs_sync_inode() but before it has done
>>> nfs_update_inode().
>>> 
>>> In the failing case there are two threads one is doing writes, the other
>>> doing lstat on aio_complete via io_getevents(2).
>>> 
>>> For each write completion the lstat thread tries to verify the file size.
>>> 
>>> GETATTR Thread                  LAYOUTCOMMIT Thread
>>> --------------                  --------------------
>>>                               write_completion sets LAYOUTCOMMIT (4096@0)
>>> --> nfs_getattr
>> 
>> filemap_write_and_wait()
>> 
>>> __nfs_revalidate_inode
>>> pnfs_sync_inode
>> 
>> NFS_PROTO(inode)->getattr()
>> 
>>> getattr sees 4096
>>>                               write_completion sets LAYOUTCOMMIT (4096@4096)
>>>                               sets LAYOUTCOMMITING
>>>                               clears LAYOUTCOMMIT
>>>                               clears LAYOUTCOMMITTING
>>> nfs_refresh_inode
>>>  nfs_update_inode size is 4096
>>> <-- nfs_getattr
>>> 
>>> At this point the cached attributes are seen as up to date, but
>>> aio-dio-extend-stat program expects that second write_completion to reflect
>>> in the file size.
>>> 
>> 
>> Why isn’t the filemap_write_and_wait() above resolving the race? I’d
>> expect that would move your “write completion sets LAYOUTCOMMIT” up to
>> before the pnfs_sync_inode().  In fact, in the patch that Christoph sent,
>> all he was doing was moving the pnfs_sync_inode() to immediately after
>> that filemap_write_and_wait() instead of relying on it in
>> _nfs_revalidate_inode.
> 
> This is O_DIRECT, I've failed to mention yet.  The second write hasn't made
> it out of __nfs_pageio_add_request() at the time filemap_write_and_wait() is
> called.  It is sleeping in pnfs_update_layout() waiting on a LAYOUTGET and it
> doesn't resumes until after filemap_write_and_wait().

Wait, so you have 1 thread doing an O_DIRECT write() and another doing a stat() in parallel? Why would there be an expectation that the filesystem should serialise those system calls?

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux