Re: t0032 fails on NFS mounts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 13, 2024 at 03:20:52AM -0400, Jeff King wrote:
> +cc Patrick for reftable
> 
> On Tue, Mar 12, 2024 at 11:10:29AM -0400, Chuck Lever wrote:
> 
> > Unit test t0032 fails when run on an NFS mount:
> > 
> > [vagrant@cel t]$ ./t0032-reftable-unittest.sh 
> > not ok 1 - unittests
> > #	
> > #		TMPDIR=$(pwd) && export TMPDIR &&
> > #		test-tool reftable
> > #	
> > # failed 1 among 1 test(s)
> > 1..1
> 
> The output for this test script is particularly unhelpful because it's
> not using our test harness at all, but just running a bunch of internal
> tests using a single program.
> 
> Running with "-v" should give more details about what's failing.
> 
> I set up a basic loopback server like:
> 
>   mkdir /mnt/{server,client}
>   exportfs -o rw,sync 127.0.0.1:/mnt/server
>   mount -t nfs 127.0.0.1:/mnt/server /mnt/client
> 
> and then ran:
> 
>   ./t0032-reftable-unittest.sh --root=/mnt/client -v
> 
> Looks like it fails at:
> 
>   running test_reftable_stack_compaction_concurrent_clean
>   reftable/stack_test.c: 1063: failed assertion count_dir_entries(dir) == 2
>   Aborted
> 
> > v2.43.2 seems to work OK.
> 
> For me, too. Bisecting shows the problem appearing in 4f36b8597c
> (reftable/stack: fix race in up-to-date check, 2024-01-18).

I think this is actually benign. I set a breakpoint in the respective
test right before double-checking our conditions, and curiously I got
back the following list of files:

    ./stack_test-1027.QJBpnd
    ./stack_test-1027.QJBpnd/0x000000000001-0x000000000003-dad7ac80.ref
    ./stack_test-1027.QJBpnd/.nfs000000000001729f00001e11
    ./stack_test-1027.QJBpnd/tables.list

Notice the ".nfs*" thing? This is a temporary file managed by the NFS
client that maintains delete-on-close behaviour because we have unlinked
the file while it was still open [1]. But of course we count that file
when executing `count_dir_entries()`, and thus we arrive at an
unexpected number of files.

I will send a patch to fix the test.

> PS That test seems to run ~20x slower on NFS versus directly on ext4.
>    I'd expect a little overhead, but that's quite a bit.

I'm not all that surprised here given that the reftable library is quite
prone to stat(3P)ing the "tables.list" file, and potentially re-reading
it. I kind of suspect that this is what's going on. An alternative
explanation might be that mmap'ing over NFS is really slow.

Anyway, I will have a deeper look at this and see where we spend all the
time.

Patrick

[1]: https://nfs.sourceforge.net/#faq_d2

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux