> On Feb 20, 2021, at 3:13 PM, Anton Ivanov <anton.ivanov@xxxxxxxxxxxxxxxxxx> wrote: > > On 20/02/2021 20:04, Salvatore Bonaccorso wrote: >> Hi, >> >> On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote: >>> Hi list, >>> >>> NFS caching appears broken in 4.19.37. >>> >>> The more cores/threads the easier to reproduce. Tested with identical >>> results on Ryzen 1600 and 1600X. >>> >>> 1. Mount an openwrt build tree over NFS v4 >>> 2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a >>> loop >>> 3. Result after 3-4 iterations: >>> >>> State on the client >>> >>> ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm >>> >>> total 8 >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ >>> >>> State as seen on the server (mounted via nfs from localhost): >>> >>> ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm >>> total 12 >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ >>> -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h >>> >>> Actual state on the filesystem: >>> >>> ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm >>> total 12 >>> drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ >>> drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ >>> -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h >>> >>> So the client has quite clearly lost the plot. Telling it to drop caches and >>> re-reading the directory shows the file present. >>> >>> It is possible to reproduce this using a linux kernel tree too, just takes >>> much more iterations - 10+ at least. >>> >>> Both client and server run 4.19.37 from Debian buster. This is filed as >>> debian bug 931500. I originally thought it to be autofs related, but IMHO it >>> is actually something fundamentally broken in nfs caching resulting in cache >>> corruption. >> According to the reporter downstream in Debian, at >> https://bugs.debian.org/940821#26 thi seem still reproducible with >> more recent kernels than the initial reported. Is there anything Anton >> can provide to try to track down the issue? >> >> Anton, can you reproduce with current stable series? > > 100% reproducible with any kernel from 4.9 to 5.4, stable or backports. It may exist in earlier versions, but I do not have a machine with anything before 4.9 to test at present. Confirming you are varying client-side kernels. Should the Linux NFS client maintainers be Cc'd? > From 1-2 make clean && make cycles to one afternoon depending on the number of machine cores. More cores/threads the faster it does it. > > I tried playing with protocol minor versions, caching options, etc - it is still reproducible for any nfs4 settings as long as there is client side caching of metadata. > > A. > >> >> Regards, >> Salvatore >> > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ -- Chuck Lever