Re: [RFC PATCH] NFS: Fix missing files in `ls` command output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 2, 2024, at 7:46 AM, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> 
> On Fri, Aug 30, 2024 at 1:57 AM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>> 
>> On 29 Aug 2024, at 8:54, Yafang Shao wrote:
>> 
>>> On Thu, Aug 29, 2024 at 8:44 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>> 
>>>> On 29 Aug 2024, at 5:13, Yafang Shao wrote:
>>>> 
>>>>> In our production environment, we noticed that some files are missing when
>>>>> running the ls command in an NFS directory. However, we can still
>>>>> successfully cd into the missing directories. This issue can be illustrated
>>>>> as follows:
>>>>> 
>>>>>  $ cd nfs
>>>>>  $ ls
>>>>>  a b c e f            <<<< 'd' is missing
>>>>>  $ cd d               <<<< success
>>>>> 
>>>>> I verified the issue with the latest upstream kernel, and it still
>>>>> persists. Further analysis reveals that files go missing when the dtsize is
>>>>> expanded. The default dtsize was reduced from 1MB to 4KB in commit
>>>>> 580f236737d1 ("NFS: Adjust the amount of readahead performed by NFS readdir").
>>>>> After restoring the default size to 1MB, the issue disappears. I also tried
>>>>> setting the default size to 8KB, and the issue similarly disappears.
>>>>> 
>>>>> Upon further analysis, it appears that there is a bad entry being decoded
>>>>> in nfs_readdir_entry_decode(). When a bad entry is encountered, the
>>>>> decoding process breaks without handling the error. We should revert the
>>>>> bad entry in such cases. After implementing this change, the issue is
>>>>> resolved.
>>>> 
>>>> It seems like you're trying to handle a server bug of some sort.  Have you
>>>> been able to look at a wire capture to determine why there's a bad entry?
>>> 
>>> I've used tcpdump to analyze the packets but didn't find anything
>>> suspicious. Do you have any suggestions?
>> 
>> I'd check to make sure the server isn't overrunning the READDIR request's
>> dircount and maxcount (they should be the same for the linux client).  If
>> the server isn't exceeding them, then there's a likely client bug.
>> 
>> Ben
>> 
> 
> Hello Ben,
> 
> Upon thorough examination, we have identified the root cause of the
> issue to lie within the NFS server, specifically its behavior of
> truncating file listings to match the client's READDIR RPC args->size
> parameter without appropriately adjusting the cookie value. After
> implementing a fix on the server side, the issue has been resolved.

Please post your server fix on this mailing list. Thanks!


> However, to enhance resilience and mitigate future server-side
> vulnerabilities, it may be prudent to implement client-side handling
> mechanisms for such issues. What do you think?

The general policy we follow is to avoid fixing server
bugs via client-side workarounds. Fix the server in
that case.


--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux