I didn't knowingly extend the files.... But I had been using some old files written months ago elsewhere. So I quickly tried with some new files... To avoid confusion and caching, I wrote them directly on the server to the local XFS filesystem that we are then exporting to client1 & client2. First thing I noticed is that there is a difference in behaviour depending on whether we write zeros or random data: server # dd if=/dev/zero of=/serverxfs/test.file.zero bs=1M count=512 server # dd if=/dev/urandom of=/serverxfs/test.file.random bs=1M count=512 client1 # md5sum /mnt/server/test-file.zero aa559b4e3523a6c931f08f4df52d58f2 client1 # md5sum /mnt/server/test-file.random b8ea132924f105d5acc27787d57a9aa2 client2 # for x in {1..10}; do (cat /mnt/server/test.file.zero > /dev/null &); done; wait client2 # md5sum /mnt/server/test.file.zero aa559b4e3523a6c931f08f4df52d58f2 client2 # for x in {1..10}; do (cat /mnt/server/test.file.random > /dev/null &); done; wait client2 # md5sum /mnt/server/test.file.random e0334bd762800ab7447bfeab033e030d So the file full of zeros is okay but the random one is getting corrupted? I'm scratching my head a bit wondering if the XFS backing filesystem server and/or how the extents are laid out could in any way effect this but the NFS client shouldn't care right? With regards to the NFS server kernel it's 3.10.0-693.1.1.el7.x86_64 but if you mean your patched kernel, I just checked out your fscache-iter-nfs branch, made a git archive and then built an RPM out of it.... I must say there are a couple of nfs re-export patches (due for v5.11) that I have also applied on top. If you still can't reproduce, then I'll rip them out and test again. Daire On Fri, Dec 4, 2020 at 7:36 PM David Wysochanski <dwysocha@xxxxxxxxxx> wrote: > On Fri, Dec 4, 2020 at 2:09 PM David Wysochanski <dwysocha@xxxxxxxxxx> > wrote: > > > > On Fri, Dec 4, 2020 at 1:03 PM Daire Byrne <daire.byrne@xxxxxxxxx> > wrote: > > > > > > David, > > > > > > Okay, I spent a little more time on this today and I think we can > forget about the re-export thing for a moment. > > > > > > I looked at what was happening and the issue seemed to be that once I > had multiple clients of the re-export server (which has the iter fscache > and fsc enabled mounts) all reading the same files at the same time (for > the first time), then we often ended up with a missing sequential chunk of > data from the cached file. > > > > > > The size and apparent size seemed to be the same as the original file > on the server but md5sum and hexdump against the client mounted file showed > otherwise. > > > > > > So then I tried to replicate this scenario in the simplest way using > just a single (fscache-iter) client with an fsc enabled mountpoint using > multiple processes to read the same uncached file for the first time (no > NFS re-exporting). > > > > > > * client1 mounts the NFS server without fsc > > > * client2 mounts the NFS server with fsc (with fscache-iter). > > > > > > client1 # md5sum /mnt/server/file.1 > > > 9ca99335b6f75a300dc22e45a776440c > > > client2 # cat /mnt/server/file.1 > > > client2 # md5sum /mnt/server/file.1 > > > 9ca99335b6f75a300dc22e45a776440c > > > > > > All good. The files was cached to disk and looks good. Now let's read > the an uncached file using multiple processes simultaneously: > > > > > > client1 # md5sum /mnt/server/file.2 > > > 9ca99335b6f75a300dc22e45a776440c > > > client2 # for x in {1..10}; do (cat /mnt/server/file.2 > /dev/null &); > done; wait > > > client2 # md5sum /mnt/server/file.2 > > > 26dd67fbf206f734df30fdec72d71429 > > > > > > The file is now different/corrupt. So in my re-export case it's just > that we have multiple knfsd processes reading in the same file > simultaneously for the first time into cache. Then it remains corrupt and > serves that out to multiple NFS clients. > > > > > > > Hmmm, yeah that for sure shouldn't happen! > > > > > > > In this case the backing filesystem was ext4 and the nfs client mount > options were fsc,vers=4.2 (vers=3 is the same). The NFS server is running > RHEL7.4. > > > > > > > How big is ' /mnt/server/file.2' and what is the NFS server kernel? > > Also can you give me the mount options from /proc/mounts on 'client2'? > > I'm not able to reproduce this yet but I'll keep trying. > > > > > > Ok I think I have a reproducer now, but it requires extending the file > size. Did you re-write the file with a new size by any chance? > It doesn't reproduce for me on first go, but after extending the size > of the file it does. > > # mount -o vers=4.2,fsc 127.0.0.1:/export/dir1 /mnt/dir1 > # dd if=/dev/urandom of=/export/dir1/file.bin bs=10M count=1 > 1+0 records in > 1+0 records out > 10485760 bytes (10 MB, 10 MiB) copied, 0.216783 s, 48.4 MB/s > # for x in {1..10}; do (cat /mnt/dir1/file.bin > /dev/null &); done; wait > # md5sum /export/dir1/file.bin /mnt/dir1/file.bin > 94d2d0fe70f155211b5559bf7de27b34 /export/dir1/file.bin > 94d2d0fe70f155211b5559bf7de27b34 /mnt/dir1/file.bin > # dd if=/dev/urandom of=/export/dir1/file.bin bs=20M count=1 > 1+0 records in > 1+0 records out > 20971520 bytes (21 MB, 20 MiB) copied, 0.453869 s, 46.2 MB/s > # for x in {1..10}; do (cat /mnt/dir1/file.bin > /dev/null &); done; wait > # md5sum /export/dir1/file.bin /mnt/dir1/file.bin > 32b9beb19b97655e9026c09bbe064dc8 /export/dir1/file.bin > f05fe078fe65b4e5c54afcd73c97686d /mnt/dir1/file.bin > # uname -r > 5.10.0-rc4-94e9633d98a5+ > > > > > > > > > > Daire > > > > > > On Thu, Dec 3, 2020 at 4:27 PM David Wysochanski <dwysocha@xxxxxxxxxx> > wrote: > > >> > > >> On Wed, Dec 2, 2020 at 12:01 PM Daire Byrne <daire.byrne@xxxxxxxxx> > wrote: > > >> > > > >> > David, > > >> > > > >> > First off, thanks for the work on this - we look forward to this > landing. > > >> > > > >> > > >> Yeah no problem - thank you for your interest and testing it! > > >> > > >> > I did some very quick tests of just the bandwidth using server > class networking (40Gbit) and storage (NVMe). > > >> > > > >> > Comparing the old fscache with the new one, we saw a minimal > degradation in reading back from the backing disk. But I am putting this > more down the the more directio style of access in the new version. > > >> > > > >> > This can be seen when the cache is being written as we no longer > use the writeback cache. I'm assuming something similar happens on reads so > that we don't use readahead? > > >> > > > >> > > >> Without getting into it too much and just guessing, I'd guess either > > >> it's the usage of directIO or the limitation of the 1GB in cachefiles, > > >> but not sure. We need to drill down of course into it because it > > >> could be a lot of things. > > >> > > >> > Anyway, the quick summary of performance using 10 threads of reads > follows. I should mention that the NVMe has a physical limit of ~2,500MB/s > writes & 5,000MB/s reads: > > >> > > > >> > iter fscache: > > >> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs) > > >> > cached subsequent reads ~4,200 (reading from nvme ext4) > > >> > cached subsequent reads ~3,500 (reading from nvme xfs) > > >> > > > >> > old fscache: > > >> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs) > > >> > cached subsequent reads ~5,000 (reading from nvme ext4) > > >> > xfs crashes a lot ... > > >> > > > >> > I have not done a thorough analysis of CPU usage or perf top > differences yet. > > >> > > > >> > Then I went on to test our rather unique NFS re-export workload > where we take this fscache backed server and re-export the fsc mounts to > many clients. At this point something odd appeared to be happening. The > clients were loading software from the fscache backed mounts but were often > segfaulting at various points. This suggested that they were getting > corrupted data or the memory mapping (binaries, libraries) was failing in > some way. Perhaps some odd interaction between fscache and knfsd? > > >> > > > >> > I did a quick test of re-export without the fsc caching enabled on > the server mounts (with the same 5.10-rc kernel) and I didn't get any > errors. That's as far as I got before I got drawn away by other things. I > hope to dig into it a little more next week. But I just thought I'd give > some quick feedback of one potential difference I'm seeing compared to the > previous version. > > >> > > > >> > > >> Hmmm, interesting. So just to be clear, you ran my patches without > > >> 'fsc' on the mount and it was fine, but with 'fsc' on the mount there > > >> were data corruptions in this re-export use case? I've not done any > > >> tests with a re-export like that but off the top of my head I'm not > > >> sure why it would be a problem. What NFS version(s) are you using? > > >> > > >> > > >> > I also totally accept that this is a very niche workload (and hard > to reproduce)... I should have more details on it next week. > > >> > > > >> > > >> Ok - thanks again Daire! > > >> > > >> > > >> > > >> > Daire > > >> > > > >> > On Sat, Nov 21, 2020 at 1:50 PM David Wysochanski < > dwysocha@xxxxxxxxxx> wrote: > > >> >> > > >> >> I just posted patches to linux-nfs but neglected to CC this list. > For > > >> >> any interested in patches which convert NFS to use the new netfs > and > > >> >> fscache APIs, please see the following series on linux-nfs: > > >> >> [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs > > >> >> https://marc.info/?l=linux-nfs&m=160596540022461&w=2 > > >> >> > > >> >> Thanks. > > >> >> > > >> >> -- > > >> >> Linux-cachefs mailing list > > >> >> Linux-cachefs@xxxxxxxxxx > > >> >> https://www.redhat.com/mailman/listinfo/linux-cachefs > > >> >> > > >> > > -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs