Re: 'ls -lrt' performance issue on large dir while dir is being modified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/18/20 9:31 AM, Trond Myklebust wrote:
On Sat, 2020-01-18 at 12:26 -0500, Chuck Lever wrote:
On Jan 18, 2020, at 10:58 AM, Trond Myklebust <
trondmy@xxxxxxxxxxxxxxx> wrote:

On Fri, 2020-01-17 at 18:29 -0800, Dai Ngo wrote:
Hi Trond,

On 1/15/20 11:06 AM, Trond Myklebust wrote:
On Wed, 2020-01-15 at 18:54 +0000, Trond Myklebust wrote:
On Wed, 2020-01-15 at 10:11 -0800, Dai Ngo wrote:
Hi Anna, Trond,

Would you please let me know your opinion regarding
reverting
the
change in
nfs_force_use_readdirplus to call nfs_zap_mapping instead
of
invalidate_mapping_pages.
This change is to prevent the cookie of the READDIRPLUS to
be
reset
to 0 while
an instance of 'ls' is running and the directory is being
modified.

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index
a73e2f8bd8ec..5d4a64555fa7 100644 --- a/fs/nfs/dir.c +++
b/fs/nfs/dir.c @@ -444,7 +444,7 @@ void
nfs_force_use_readdirplus(struct inode *dir)      if
(nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
!list_empty(&nfsi->open_files)) {
set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); -
invalidate_mapping_pages(dir->i_mapping, 0, -1); +
nfs_zap_mapping(dir, dir->i_mapping);      }  }
Thanks,
-Dai

On 12/19/19 8:01 PM, Dai Ngo wrote:
Hi Anna, Trond,

I made a mistake with the 5.5 numbers. The VM that runs
5.5
has
some
problems. There is no regression with 5.5, here are the
new
numbers:

Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s  197891:
10m35.789s
Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s  192801:
3m55.003s

My apologies for the mistake.

Now there is no regression with 5.5, I'd like to get your
opinion
regarding the change to revert the call from
invalidate_mapping_pages
to nfs_zap_mapping in nfs_force_use_readdirplus to
prevent
the
current 'ls' from restarting the READDIRPLUS3 from cookie
0.
I'm
not quite sure about the intention of the prior change
from
nfs_zap_mapping to invalidate_mapping_pages so that is
why
I'm
seeking advise. Or do you have any suggestions to achieve
the
same?

Thanks,
-Dai

On 12/17/19 4:34 PM, Dai Ngo wrote:
Hi,

I'd like to report an issue with 'ls -lrt' on NFSv3
client
takes
a very long time to display the content of a large
directory
(100k - 200k files) while the directory is being
modified
by
another NFSv3 client.

The problem can be reproduced using 3 systems. One
system
serves
as the NFS server, one system runs as the client that
doing
the
'ls -lrt' and another system runs the client that
creates
files
on the server.
     Client1 creates files using this simple script:

#!/bin/sh

if [ $# -lt 2 ]; then
         echo "Usage: $0 number_of_files
base_filename"
         exit
fi    nfiles=$1
fname=$2
echo "creating $nfiles files using
filename[$fname]..."
i=0         while [ i -lt $nfiles ] ;
do            i=`expr $i + 1`
         echo "xyz" > $fname$i
         echo "$fname$i" done
Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a
loop.

The network traces and dtrace probes showed numerous
READDIRPLUS3
requests restarting  from cookie 0 which seemed to
indicate
the
cached pages of the directory were invalidated causing
the
pages
to be refilled starting from cookie 0 until the current
requested
cookie.  The cached page invalidation were tracked to
nfs_force_use_readdirplus().  To verify, I made the
below
modification, ran the test for various kernel versions
and
captured the results shown below.

The modification is:

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a73e2f8bd8ec..5d4a64555fa7 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -444,7 +444,7 @@ void
nfs_force_use_readdirplus(struct
inode
*dir)
      if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)
&&
          !list_empty(&nfsi->open_files)) {
          set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi-
flags);
-        invalidate_mapping_pages(dir->i_mapping, 0,
-1);
+        nfs_zap_mapping(dir, dir->i_mapping);
      }
  }
This change is only reverting part of commit 79f687a3de9e. My
problem
with that is as follows:

RFC1813 states that NFSv3 READDIRPLUS cookies and verifiers
must
match
those returned by previous READDIRPLUS calls, and READDIR
cookies
and
verifiers must match those returned by previous READDIR
calls. It
says
nothing about being able to assume cookies from READDIR and
READDIRPLUS
calls are interchangeable. So the only reason I can see for
the
invalidate_mapping_pages() is to ensure that we do separate
the
two
cookie caches.
If I understand your concern correctly that in NFSv3 the client
must
maintain valid cookies and cookie verifiers when switching
between
READDIR and READDIRPLUS, or vice sersa, then I think the current
client
code handles this condition ok.

On the client, both READDIR and READDIRPLUS requests use the
cookie
values
from the same cached pages of the directory so I don't think they
can
be
out of sync when the client switches between READDIRPLUS and
READDIR
requests for different nfs_readdir calls.

In fact, currently the first nfs_readdir uses READDIRPLUS's to
fill
read
the entries and if there is no LOOKUP/GETATTR on one of the
directory
entries then the client reverts to READDIR's for subsequent
nfs_readdir
calls without invalidating any cached pages of the directory. If
there
are LOOKUP/GETATTR done on one of the directory entries then
nfs_advise_use_readdirplus is called which forces the client to
use
READDIRPLUS again for the next nfs_readdir.

Unless the user mounts the export with 'nordirplus' option then
the
client uses only READDIRs for all requests, no switching takes
place.
I don't understand your point.
The original point was that the directory's page cache seems to
be cleared a little too often (quite apart from switching between
READDIRPLUS and READDIR).

I think Dai is saying that cache clearing is appropriate to defer
when the directory's mtime has changed but the READ method remains
the same. Otherwise repeatedly adding a new file to a very large
directory that is being read can trigger a situation where the
reading getdents loop never completes.

Fair enough, but the section of code he was touching with his patch
should be only about switching between READDIR/PLUS methods.

The code that I'd to patch, nfs_force_use_readdirplus, is called when
there is an attribute cache miss which causes the client to make sure
READDIRPLUS is being used on the directory, regardless which READ method
is currently being used using for the directory. So it's possible that
the call to nfs_force_use_readdirplus will cause the next getdent/nfs_readdir
to switch to from READDIR to READDIRPLUS.


The mtime monitoring happens elsewhere and is already set up to
invalidate the cache only on opendir()/rewinddir().

My two cents Euro.


The issue is that
nfs_advise_use_readdirplus() can cause the behaviour to switch
between
use of READDIRPLUS and use of READDIR from one syscall to
getdents() to
the next.
If the client is using the same page cache, across those syscalls,
then
it will end up caching a mixture of cookies. Furthermore, since the
cookie that is used as an argument to the next call to
READDIR/READDIRPLUS is taken from that page cache, then we can end
up
calling READDIRPLUS with a cookie that came from READDIR and vice
versa.

As I said, I'm not convinced that is legal in RFC1813 (NFSv3).

I think this is the contention point: the spec did not explicitly
mention anything regarding mixing of cookies from READDIR & READDIRPLUS.

However, as I mentioned, the current client implementation already mixing
cookies between READDIRPLUS and READDIR, everytime the user does a simple
'ls' on a large directory, without invalidating any mapping.

Also, as Chuck mentioned, we're not aware of any server implementation
that has problems with this mixing of cookies.

-Dai


That is why we want to clear the page cache when we swap between
use of
READDIR and use of READDIRPLUS for the case of NFSv3.
Just curious, are you aware of an NFSv3 server implementation that
would have a problem with a client that mixes the cookies?


Thanks,
-Dai

OTOH, for NFSv4, there is no separate READDIRPLUS function,
so
there
really does not appear to be any reason to clear the page
cache
at
all
as we're switching between requesting attributes or not.

Sorry... To spell out my objection to this change more clearly:
The
call to nfs_zap_mapping() makes no sense in either case.
  * It defers the cache invalidation until the next call to
    rewinddir()/opendir(), so it does not address the NFSv3
concern.
  * It would appear to be entirely superfluous for the NFSv4
case.

So a change that might be acceptable would be to keep the
existing
call
to invalidate_mapping_pages() for NFSv3, but to remove it for
NFSv4.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx
--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux