Re: [PATCH] nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Mon, 29 Jul 2024 14:40:56 +0000

> On Jul 29, 2024, at 10:26 AM, Youzhong Yang <youzhong@xxxxxxxxx> wrote:
> 
> How is this going? any chance to move forward and deal with the EEXIST
> case in a future patch? I see no harm in keeping the EEXIST check.

The EEXIST patch has been applied to nfsd-next for a while, along
with the other patches in this series.

> On Mon, Jul 15, 2024 at 10:06 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> On Mon, Jul 15, 2024 at 08:25:53AM -0400, Jeff Layton wrote:
>>> On Mon, 2024-07-15 at 10:27 +1000, NeilBrown wrote:
>>>> On Fri, 12 Jul 2024, Jeff Layton wrote:
>>>>> Given that we do the search and insertion while holding the i_lock, I
>>>>> don't think it's possible for us to get EEXIST here. Remove this case.
>>>> 
>>>> I was going to comment that as rhltable_insert() cannot return -EEXIST
>>>> that is an extra reason to discard the check.  But then I looked at the
>>>> code an I cannot convince myself that it cannot.
>>>> If __rhashtable_insert_fast() finds that tbl->future_tbl is not NULL it
>>>> calls rhashtable_insert_slow(), and that seems to fail if the key
>>>> already exists.  But it shouldn't for an rhltable, it should just add
>>>> the new item to the linked list for that key.
>>>> 
>>>> It looks like this has always been broken: adding to an rhltable during
>>>> a resize event can cause EEXIST....
>>>> 
>>>> Would anyone like to check my work?  I'm surprise that hasn't been
>>>> noticed if it is really the case.
>>>> 
>>>> 
>>> 
>>> I don't know this code well at all, but it looks correct to me:
>>> 
>>> static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>>>                                   struct rhash_head *obj)
>>> {
>>>        struct bucket_table *new_tbl;
>>>        struct bucket_table *tbl;
>>>        struct rhash_lock_head __rcu **bkt;
>>>        unsigned long flags;
>>>        unsigned int hash;
>>>        void *data;
>>> 
>>>        new_tbl = rcu_dereference(ht->tbl);
>>> 
>>>        do {
>>>                tbl = new_tbl;
>>>                hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>>>                if (rcu_access_pointer(tbl->future_tbl))
>>>                        /* Failure is OK */
>>>                        bkt = rht_bucket_var(tbl, hash);
>>>                else
>>>                        bkt = rht_bucket_insert(ht, tbl, hash);
>>>                if (bkt == NULL) {
>>>                        new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>>>                        data = ERR_PTR(-EAGAIN);
>>>                } else {
>>>                        flags = rht_lock(tbl, bkt);
>>>                        data = rhashtable_lookup_one(ht, bkt, tbl,
>>>                                                     hash, key, obj);
>>>                        new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>>>                                                        hash, obj, data);
>>>                        if (PTR_ERR(new_tbl) != -EEXIST)
>>>                                data = ERR_CAST(new_tbl);
>>> 
>>>                        rht_unlock(tbl, bkt, flags);
>>>                }
>>>        } while (!IS_ERR_OR_NULL(new_tbl));
>>> 
>>>        if (PTR_ERR(data) == -EAGAIN)
>>>                data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
>>>                               -EAGAIN);
>>> 
>>>        return data;
>>> }
>>> 
>>> I'm assuming the part we need to worry about is where
>>> rhashtable_insert_one returns -EEXIST.
>>> 
>>> It holds the rht_lock across the lookup and insert though. So if
>>> rhashtable_insert_one returns -EEXIST, then "data" must be something
>>> valid. In that case, "data" won't be overwritten and it will fall
>>> through and return the pointer to the entry already there.
>>> 
>>> That said, this logic is really convoluted, so I may have missed
>>> something too.
>> 
>> This is the issue I was concerned about after my review: it's
>> obvious that the rhtable API can return -EEXIST, but it's just
>> really hard to tell whether the rh/l/table API will ever return
>> -EEXIST.
>> 
>> As Neil says, the rhtable "hash table full" case should not happen
>> with rhltable. But can we prove that?
>> 
>> If we are not yet confident, then maybe PATCH 1/3 should replace
>> the "if (ret == -EEXIST)" with "WARN_ON(ret == -EEXIST)"...? It's
>> also possible to ask the human(s) who constructed the rhltable
>> code. :-)
>> 
>> 
>>>>> Cc: Youzhong Yang <youzhong@xxxxxxxxx>
>>>>> Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
>>>>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
>>>>> ---
>>>>> This is replacement for PATCH 1/3 in the series I sent yesterday. I
>>>>> think it makes sense to just eliminate this case.
>>>>> ---
>>>>> fs/nfsd/filecache.c | 2 --
>>>>> 1 file changed, 2 deletions(-)
>>>>> 
>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>> index f84913691b78..b9dc7c22242c 100644
>>>>> --- a/fs/nfsd/filecache.c
>>>>> +++ b/fs/nfsd/filecache.c
>>>>> @@ -1038,8 +1038,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>  if (likely(ret == 0))
>>>>>          goto open_file;
>>>>> 
>>>>> - if (ret == -EEXIST)
>>>>> -         goto retry;
>>>>>  trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret);
>>>>>  status = nfserr_jukebox;
>>>>>  goto construction_err;
>>>>> 
>>>>> ---
>>>>> base-commit: ec1772c39fa8dd85340b1a02040806377ffbff27
>>>>> change-id: 20240711-nfsd-next-c9d17f66e2bd
>>>>> 
>>>>> Best regards,
>>>>> --
>>>>> Jeff Layton <jlayton@xxxxxxxxxx>
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Jeff Layton <jlayton@xxxxxxxxxx>
>> 
>> --
>> Chuck Lever

--
Chuck Lever