Re: NFS Caching broken in 4.19.37

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/02/2021 09:13, Salvatore Bonaccorso wrote:
Hi,

On Sat, Feb 20, 2021 at 08:16:26PM +0000, Chuck Lever wrote:


On Feb 20, 2021, at 3:13 PM, Anton Ivanov <anton.ivanov@xxxxxxxxxxxxxxxxxx> wrote:

On 20/02/2021 20:04, Salvatore Bonaccorso wrote:
Hi,

On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote:
Hi list,

NFS caching appears broken in 4.19.37.

The more cores/threads the easier to reproduce. Tested with identical
results on Ryzen 1600 and 1600X.

1. Mount an openwrt build tree over NFS v4
2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a
loop
3. Result after 3-4 iterations:

State on the client

ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../

State as seen on the server (mounted via nfs from localhost):

ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

Actual state on the filesystem:

ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

So the client has quite clearly lost the plot. Telling it to drop caches and
re-reading the directory shows the file present.

It is possible to reproduce this using a linux kernel tree too, just takes
much more iterations - 10+ at least.

Both client and server run 4.19.37 from Debian buster. This is filed as
debian bug 931500. I originally thought it to be autofs related, but IMHO it
is actually something fundamentally broken in nfs caching resulting in cache
corruption.
According to the reporter downstream in Debian, at
https://bugs.debian.org/940821#26 thi seem still reproducible with
more recent kernels than the initial reported. Is there anything Anton
can provide to try to track down the issue?

Anton, can you reproduce with current stable series?

100% reproducible with any kernel from 4.9 to 5.4, stable or backports. It may exist in earlier versions, but I do not have a machine with anything before 4.9 to test at present.

Confirming you are varying client-side kernels. Should the Linux
NFS client maintainers be Cc'd?

Ok, agreed. Let's add them as well. NFS client maintainers any ideas
on how to trackle this?

This is not observed with Debian backports 5.10 package

uname -a
Linux madding 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux

I left the testcase running for ~ 4 hours on a 6core/12thread Ryzen. It should have blown up 10 times by now.

So one of the commits between 5.4 and 5.10.13 fixed it.

If nobody can think of a particular commit which fixes it, I can try dissecting it during the week.

A.



 From 1-2 make clean && make  cycles to one afternoon depending on the number of machine cores. More cores/threads the faster it does it.

I tried playing with protocol minor versions, caching options, etc - it is still reproducible for any nfs4 settings as long as there is client side caching of metadata.

A.


Regards,
Salvatore


--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

--
Chuck Lever

Regards,
Salvatore



--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux