Hi, On Sat, Feb 20, 2021 at 08:16:26PM +0000, Chuck Lever wrote: > > > > On Feb 20, 2021, at 3:13 PM, Anton Ivanov <anton.ivanov@xxxxxxxxxxxxxxxxxx> wrote: > > > > On 20/02/2021 20:04, Salvatore Bonaccorso wrote: > >> Hi, > >> > >> On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote: > >>> Hi list, > >>> > >>> NFS caching appears broken in 4.19.37. > >>> > >>> The more cores/threads the easier to reproduce. Tested with identical > >>> results on Ryzen 1600 and 1600X. > >>> > >>> 1. Mount an openwrt build tree over NFS v4 > >>> 2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a > >>> loop > >>> 3. Result after 3-4 iterations: > >>> > >>> State on the client > >>> > >>> ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm > >>> > >>> total 8 > >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ > >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ > >>> > >>> State as seen on the server (mounted via nfs from localhost): > >>> > >>> ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm > >>> total 12 > >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ > >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ > >>> -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h > >>> > >>> Actual state on the filesystem: > >>> > >>> ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm > >>> total 12 > >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ > >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ > >>> -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h > >>> > >>> So the client has quite clearly lost the plot. Telling it to drop caches and > >>> re-reading the directory shows the file present. > >>> > >>> It is possible to reproduce this using a linux kernel tree too, just takes > >>> much more iterations - 10+ at least. > >>> > >>> Both client and server run 4.19.37 from Debian buster. This is filed as > >>> debian bug 931500. I originally thought it to be autofs related, but IMHO it > >>> is actually something fundamentally broken in nfs caching resulting in cache > >>> corruption. > >> According to the reporter downstream in Debian, at > >> https://bugs.debian.org/940821#26 thi seem still reproducible with > >> more recent kernels than the initial reported. Is there anything Anton > >> can provide to try to track down the issue? > >> > >> Anton, can you reproduce with current stable series? > > > > 100% reproducible with any kernel from 4.9 to 5.4, stable or backports. It may exist in earlier versions, but I do not have a machine with anything before 4.9 to test at present. > > Confirming you are varying client-side kernels. Should the Linux > NFS client maintainers be Cc'd? Ok, agreed. Let's add them as well. NFS client maintainers any ideas on how to trackle this? > > > From 1-2 make clean && make cycles to one afternoon depending on the number of machine cores. More cores/threads the faster it does it. > > > > I tried playing with protocol minor versions, caching options, etc - it is still reproducible for any nfs4 settings as long as there is client side caching of metadata. > > > > A. > > > >> > >> Regards, > >> Salvatore > >> > > > > -- > > Anton R. Ivanov > > Cambridgegreys Limited. Registered in England. Company Number 10273661 > > https://www.cambridgegreys.com/ > > -- > Chuck Lever Regards, Salvatore