[BUG] stat randomly fails with ENOENT, including on . and ..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Debian 12.5 machines (but the same problem occurred with Debian 11),
a "stat" or "open" sometimes fails with the "No such file or directory"
error message, though the file exists and was obtained with "readdir".
Note that the "stat" may succeed, but the "open" that comes just after
may fail. Said otherwise, it seems like the file suddenly disappears.

Note also that this can occur even on . or .. (see below), which are
supposed to always exist.

I could find that other users got the same kind of issue:

  https://bugzilla.redhat.com/show_bug.cgi?id=228801
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017720

though I don't know whether the cause is the same.

The mount options are the following ones:

filer.lip.ens-lyon.fr:/tank/home/vlefevre on /home/vlefevre type nfs4 (rw,nosuid,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=140.77.14.11,local_lock=none,addr=140.77.14.38)

and the kernel version on this client machine:

Linux cassis 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

If need be, I can get more information from the admins about the server.

But perhaps you can reproduce the problem with the simple Perl script
below. To be able to reproduce it, the directory needs to be recent (I
cannot reproduce the problem if I use an old directory, even if I add
files into it). I can reproduce the problem with an empty directory
(the fact that there are . and .. seems sufficient), but it is better
to add files, say with

  mkdir test && cd test && touch `seq 999`

The permissions of the directory and the files may be set to read-only.

Moreover, I cannot reproduce the problem with only one worker thread
($maxthreads = 1) or if I modify the script to use pre-existing threads
instead of creating a new thread for each file obtained by readdir.
And I cannot reproduce the problem with a local (non-NFS) directory
(with this machine or another one).

------------------------------------------------------------------------
#!/usr/bin/env perl

use strict;
use threads;

@ARGV == 1 || @ARGV == 2 or die "Usage: $0 <dir> [ <maxthreads> ]\n";
my ($dir,$maxthreads) = @ARGV;

-d $dir or die "$0: $dir is not a directory\n";

if (defined $maxthreads)
  {
    $maxthreads =~ /^\d+$/ && $maxthreads >= 1 && $maxthreads <= 32
      or die "$0: maxthreads must be an integer between 1 and 32\n";
  }
else
  {
    $maxthreads = 2;
  }

sub stat_test ($) {
  foreach my $i (1..100)
    {
      stat "$dir/$_[0]"
        or warn("$0: can't stat $_[0] ($!, i = $i)\n"), last;
    }
}

my $nthreads = 0;

sub join_threads () {
  my @thr;
  0 until @thr = threads->list(threads::joinable);
  foreach my $thr (@thr)
    { $thr->join(); }
  $nthreads -= @thr;
}

opendir DIR, $dir or die "$0: opendir failed ($!)\n";
while (my $file = readdir DIR)
  {
    $nthreads < $maxthreads or join_threads;
    $nthreads++ < $maxthreads or die "$0: internal error\n";
    threads->create(\&stat_test, $file);
  }
closedir DIR or die "$0: closedir failed ($!)\n";
join_threads while $nthreads;
------------------------------------------------------------------------

Example of failure:

$ strace -o str.out -f ./dir-stat2 test
./dir-stat2: can't stat . (No such file or directory, i = 16)
./dir-stat2: can't stat .. (No such file or directory, i = 56)
./dir-stat2: can't stat 5 (No such file or directory, i = 30)
./dir-stat2: can't stat 2 (No such file or directory, i = 72)
./dir-stat2: can't stat 8 (No such file or directory, i = 1)

Excerpt of the strace output:

[...]
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766650 brk(0x562c8a37d000 <unfinished ...>
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.",  <unfinished ...>
2766650 <... brk resumed>)              = 0x562c8a37d000
2766651 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.",  <unfinished ...>
2766650 fchdir(4 <unfinished ...>
2766651 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.", {st_mode=S_IFDIR|0755, st_size=11, ...}, 0) = 0
2766651 newfstatat(AT_FDCWD, "test/.",  <unfinished ...>
2766650 <... fchdir resumed>)           = 0
2766651 <... newfstatat resumed>0x562c8a0e7848, 0) = -1 ENOENT (No such file or directory)
2766651 write(2, "./dir-stat2: can't stat . (No su"..., 62 <unfinished ...>
[...]

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux