Re: [PATCH] nfsd: fix race to check ls_layouts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 28, 2023, at 10:20 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> 
> On Sat, 2023-01-28 at 14:15 +0000, Chuck Lever III wrote:
>> [ Cc'ing the original author of this code. ]
>> 
>> Proposed patch is here:
>> 
>> https://lore.kernel.org/linux-nfs/979eebe94ef380af6a5fdb831e78fd4c0946a59e.1674836262.git.bcodding@xxxxxxxxxx/
>> 
>>> On Jan 28, 2023, at 8:47 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> 
>>> On Sat, 2023-01-28 at 08:31 -0500, Benjamin Coddington wrote:
>>>> On 27 Jan 2023, at 13:03, Jeff Layton wrote:
>>>> 
>>>>> On Fri, 2023-01-27 at 11:42 -0500, Benjamin Coddington wrote:
>>>>>> On 27 Jan 2023, at 11:34, Chuck Lever III wrote:
>>>>>> 
>>>>>>>> On Jan 27, 2023, at 11:18 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> Its possible for __break_lease to find the layout's lease before we've
>>>>>>>> added the layout to the owner's ls_layouts list.  In that case, setting
>>>>>>>> ls_recalled = true without actually recalling the layout will cause the
>>>>>>>> server to never send a recall callback.
>>>>>>>> 
>>>>>>>> Move the check for ls_layouts before setting ls_recalled.
>>>>>>>> 
>>>>>>>> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
>>>>>>> 
>>>>>>> Did this start misbehaving recently, or has it always been broken?
>>>>>>> That is, does it need:
>>>>>>> 
>>>>>>> Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls") ?
>>>>>> 
>>>>>> I'm doing some new testing of racing LAYOUTGET and CB_LAYOUTRETURN after
>>>>>> running into a livelock, so I think it has always been broken and the Fixes
>>>>>> tag is probably appropriate.
>>>>>> 
>>>>>> However, now I'm wondering if we'd run into trouble if ls_layouts could be
>>>>>> empty but the lease still exist..  but that seems like it would be a
>>>>>> different bug.
>>>>>> 
>>>>> 
>>>>> Yeah, is that even possible? Surely once the last layout is gone, we
>>>>> drop the stateid? In any case, this patch looks fine. You can add:
>>>>> 
>>>>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
>>>> 
>>>> Jeff pointed out that there's another problem here.  We can't just skip
>>>> sending the callback if ls_layouts is empty, because then the process trying
>>>> to break the lease will end up spinning in __break_lease.
>>>> 
>>>> I think we can drop the list_empty() check altogether - it must be there so
>>>> that we don't race in and send a callback for a layout that's already been
>>>> returned, but I don't see any harm in that.  Clients should just return
>>>> NO_MATCHING_LAYOUT.
>>>> 
>>> 
>>> The bigger worry (AFAICS) is that there is a potential race between
>>> LAYOUTGET and CB_LAYOUTRECALL:
>>> 
>>> The lease is set very early in the LAYOUTGET process, and it can be
>>> broken at any time beyond that point, even before LAYOUTGET is done and
>>> has populated the ls_layouts list. If __break_lease gets called before
>>> the list is populated, then the recall won't be sent (because ls_layouts
>>> is still empty), but the LAYOUTGET will still complete successfully.
>>> 
>>> I think we need a check at the end of nfsd4_layoutget, after the
>>> nfsd4_insert_layout call to see whether the lease has been broken. If it
>>> has, then we should unwind everything and return NFS4ERR_RECALLCONFLICT.
>> 
>> Shall I drop this fix from nfsd-next, then?
>> 
> 
> No, I think Ben's fix is still valid. The problem I'm seeing is a
> different issue in the same area of the code. A follow-on patch to
> address that is appropriate.

Thanks for clarifying... I wasn't sure whether y'all were planning
a replacement patch or an addendum. Sounds like the latter, so I'll
leave Ben's fix on the queue.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux