Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jul 28, 2016, at 11:38, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
> 
> 
> 
> On 28 Jul 2016, at 10:04, Trond Myklebust wrote:
> 
>>> On Jul 28, 2016, at 08:31, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> 
>>>> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote:
>>>> 
>>>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
>>>>>> 
>>>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>>>>>>> 
>>>>>>>>> After adding more debugging, I see that all of that is working correctly,
>>>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 from the
>>>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it back
>>>>>>>>> up again.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Excellent! Thanks for debugging that.
>>>>>>>> 
>>>>>>>>> Now I see that we should be marking the block extents as written atomically with
>>>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT can
>>>>>>>>> collect extents just added from the next bl_write_cleanup().  Then, the next
>>>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the first
>>>>>>>>> LAYOUTCOMMIT.  Not sure if that particular problem is the whole fix, but
>>>>>>>>> that's something to work on.
>>>>>>>>> 
>>>>>>>>> I see ways to fix that:
>>>>>>>>> 
>>>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to call
>>>>>>>>> ext_tree_mark_written() inside the i_lock
>>>>>>>>> 
>>>>>>>>> - make another pnfs_layoutdriver_type operation to be used within
>>>>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and call
>>>>>>>>> ext_tree_mark_written() within that..
>>>>>>>>> 
>>>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that would
>>>>>>>>> extend the current LAYOUTCOMMIT
>>>>>>>>> 
>>>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb
>>>>>>>> 
>>>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be allowed to extend the args->lastbytewritten. This is a metadata operation that is owned by the pNFS layout driver.
>>>>>>>> The only thing I’d note is you should then rewrite the failure case in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved “end_pos”, but uses args->lastbytewritten instead (with a comment to the effect why)…
>>>>>>> 
>>>>>>> In fact, given the potential for races here, I think the right thing to do is to have ext_tree_prepare_commit() always set the correct value for args->lastbytewritten.
>>>>>> 
>>>>>> OK, that has cleared up that common failure case that was getting in the
>>>>>> way, but now it can still fail like this:
>>>>>> 
>>>>> 
>>>>> Good progress! :-)
>>>>> 
>>>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT
>>>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
>>>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT
>>>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
>>>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid
>>>>>> 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT
>>>>> 
>>>>> Is this being tested on top of the current linux-next/testing? Normally, I’d expect http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed to cause 1st nfs_getattr() to not declare the cache valid.
>>>> 
>>>> Yes, this is on your linux-next branch.
>>>> 
>>>> When the 1st nfs_getattr() goes through nfs_update_inode() the second time
>>>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by anything,
>>>> since all the attributes returned match the cache.  So even though
>>>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false",
>>>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable.
>>>> 
>>>> Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR?
>>>> 
>>>> Ben
>>> 
>>> nfs_post_op_update_inode_locked() should be doing that as part of the callchain in nfs_writeback_update_inode().
>>> 
>> 
>> By the way. I just noticed that nothing appears to be using the attributes we retrieve as part of the layoutcommit call. Does adding a nfs_refresh_inode() to the “success” path in nfs4_layoutcommit_done() perhaps help?
> 
> We do it in layoutcommit_release:
> 
> nfs4_layoutcommit_done [nfsv4]() {
>   ...
> }
> nfs4_layoutcommit_release [nfsv4]() {
>   ...
>   nfs_post_op_update_inode_force_wcc [nfs]() {
>     nfs_post_op_update_inode_force_wcc_locked [nfs]() {
>       nfs_post_op_update_inode_locked [nfs]() {
>         nfs4_have_delegation [nfsv4]() {
>           nfs4_do_check_delegation [nfsv4]();
>         }
>         nfs_refresh_inode_locked [nfs]() {
>           nfs_update_inode [nfs]() {
> 
> 
> Should I still try adding it in nfs4_layoutcommit_done()?

No, that’s OK. As long as we’re using it… Please see my previous email, though, for how we might change nfs_update_inode()��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux