> On Sep 2, 2024, at 2:27 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > >> On Sep 2, 2024, at 7:46 AM, Yafang Shao <laoar.shao@xxxxxxxxx> wrote: >> >> On Fri, Aug 30, 2024 at 1:57 AM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> On 29 Aug 2024, at 8:54, Yafang Shao wrote: >>> >>>> On Thu, Aug 29, 2024 at 8:44 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>>>> >>>>> On 29 Aug 2024, at 5:13, Yafang Shao wrote: >>>>> >>>>>> In our production environment, we noticed that some files are missing when >>>>>> running the ls command in an NFS directory. However, we can still >>>>>> successfully cd into the missing directories. This issue can be illustrated >>>>>> as follows: >>>>>> >>>>>> $ cd nfs >>>>>> $ ls >>>>>> a b c e f <<<< 'd' is missing >>>>>> $ cd d <<<< success >>>>>> >>>>>> I verified the issue with the latest upstream kernel, and it still >>>>>> persists. Further analysis reveals that files go missing when the dtsize is >>>>>> expanded. The default dtsize was reduced from 1MB to 4KB in commit >>>>>> 580f236737d1 ("NFS: Adjust the amount of readahead performed by NFS readdir"). >>>>>> After restoring the default size to 1MB, the issue disappears. I also tried >>>>>> setting the default size to 8KB, and the issue similarly disappears. >>>>>> >>>>>> Upon further analysis, it appears that there is a bad entry being decoded >>>>>> in nfs_readdir_entry_decode(). When a bad entry is encountered, the >>>>>> decoding process breaks without handling the error. We should revert the >>>>>> bad entry in such cases. After implementing this change, the issue is >>>>>> resolved. >>>>> >>>>> It seems like you're trying to handle a server bug of some sort. Have you >>>>> been able to look at a wire capture to determine why there's a bad entry? >>>> >>>> I've used tcpdump to analyze the packets but didn't find anything >>>> suspicious. Do you have any suggestions? >>> >>> I'd check to make sure the server isn't overrunning the READDIR request's >>> dircount and maxcount (they should be the same for the linux client). If >>> the server isn't exceeding them, then there's a likely client bug. >>> >>> Ben >>> >> >> Hello Ben, >> >> Upon thorough examination, we have identified the root cause of the >> issue to lie within the NFS server, specifically its behavior of >> truncating file listings to match the client's READDIR RPC args->size >> parameter without appropriately adjusting the cookie value. After >> implementing a fix on the server side, the issue has been resolved. > > Please post your server fix on this mailing list. Thanks! I was assuming your test server was Linux NFSD. If not, then please ignore me! -- Chuck Lever