On Sat, 2020-01-18 at 12:26 -0500, Chuck Lever wrote: > > On Jan 18, 2020, at 10:58 AM, Trond Myklebust < > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Fri, 2020-01-17 at 18:29 -0800, Dai Ngo wrote: > > > Hi Trond, > > > > > > On 1/15/20 11:06 AM, Trond Myklebust wrote: > > > > On Wed, 2020-01-15 at 18:54 +0000, Trond Myklebust wrote: > > > > > On Wed, 2020-01-15 at 10:11 -0800, Dai Ngo wrote: > > > > > > Hi Anna, Trond, > > > > > > > > > > > > Would you please let me know your opinion regarding > > > > > > reverting > > > > > > the > > > > > > change in > > > > > > nfs_force_use_readdirplus to call nfs_zap_mapping instead > > > > > > of > > > > > > invalidate_mapping_pages. > > > > > > This change is to prevent the cookie of the READDIRPLUS to > > > > > > be > > > > > > reset > > > > > > to 0 while > > > > > > an instance of 'ls' is running and the directory is being > > > > > > modified. > > > > > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index > > > > > > > a73e2f8bd8ec..5d4a64555fa7 100644 --- a/fs/nfs/dir.c +++ > > > > > > > b/fs/nfs/dir.c @@ -444,7 +444,7 @@ void > > > > > > > nfs_force_use_readdirplus(struct inode *dir) if > > > > > > > (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > > > > > > > !list_empty(&nfsi->open_files)) { > > > > > > > set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); - > > > > > > > invalidate_mapping_pages(dir->i_mapping, 0, -1); + > > > > > > > nfs_zap_mapping(dir, dir->i_mapping); } } > > > > > > Thanks, > > > > > > -Dai > > > > > > > > > > > > On 12/19/19 8:01 PM, Dai Ngo wrote: > > > > > > > Hi Anna, Trond, > > > > > > > > > > > > > > I made a mistake with the 5.5 numbers. The VM that runs > > > > > > > 5.5 > > > > > > > has > > > > > > > some > > > > > > > problems. There is no regression with 5.5, here are the > > > > > > > new > > > > > > > numbers: > > > > > > > > > > > > > > Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s 197891: > > > > > > > 10m35.789s > > > > > > > Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s 192801: > > > > > > > 3m55.003s > > > > > > > > > > > > > > My apologies for the mistake. > > > > > > > > > > > > > > Now there is no regression with 5.5, I'd like to get your > > > > > > > opinion > > > > > > > regarding the change to revert the call from > > > > > > > invalidate_mapping_pages > > > > > > > to nfs_zap_mapping in nfs_force_use_readdirplus to > > > > > > > prevent > > > > > > > the > > > > > > > current 'ls' from restarting the READDIRPLUS3 from cookie > > > > > > > 0. > > > > > > > I'm > > > > > > > not quite sure about the intention of the prior change > > > > > > > from > > > > > > > nfs_zap_mapping to invalidate_mapping_pages so that is > > > > > > > why > > > > > > > I'm > > > > > > > seeking advise. Or do you have any suggestions to achieve > > > > > > > the > > > > > > > same? > > > > > > > > > > > > > > Thanks, > > > > > > > -Dai > > > > > > > > > > > > > > On 12/17/19 4:34 PM, Dai Ngo wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'd like to report an issue with 'ls -lrt' on NFSv3 > > > > > > > > client > > > > > > > > takes > > > > > > > > a very long time to display the content of a large > > > > > > > > directory > > > > > > > > (100k - 200k files) while the directory is being > > > > > > > > modified > > > > > > > > by > > > > > > > > another NFSv3 client. > > > > > > > > > > > > > > > > The problem can be reproduced using 3 systems. One > > > > > > > > system > > > > > > > > serves > > > > > > > > as the NFS server, one system runs as the client that > > > > > > > > doing > > > > > > > > the > > > > > > > > 'ls -lrt' and another system runs the client that > > > > > > > > creates > > > > > > > > files > > > > > > > > on the server. > > > > > > > > Client1 creates files using this simple script: > > > > > > > > > > > > > > > > > #!/bin/sh > > > > > > > > > > > > > > > > > > if [ $# -lt 2 ]; then > > > > > > > > > echo "Usage: $0 number_of_files > > > > > > > > > base_filename" > > > > > > > > > exit > > > > > > > > > fi nfiles=$1 > > > > > > > > > fname=$2 > > > > > > > > > echo "creating $nfiles files using > > > > > > > > > filename[$fname]..." > > > > > > > > > i=0 while [ i -lt $nfiles ] ; > > > > > > > > > do i=`expr $i + 1` > > > > > > > > > echo "xyz" > $fname$i > > > > > > > > > echo "$fname$i" done > > > > > > > > Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a > > > > > > > > loop. > > > > > > > > > > > > > > > > The network traces and dtrace probes showed numerous > > > > > > > > READDIRPLUS3 > > > > > > > > requests restarting from cookie 0 which seemed to > > > > > > > > indicate > > > > > > > > the > > > > > > > > cached pages of the directory were invalidated causing > > > > > > > > the > > > > > > > > pages > > > > > > > > to be refilled starting from cookie 0 until the current > > > > > > > > requested > > > > > > > > cookie. The cached page invalidation were tracked to > > > > > > > > nfs_force_use_readdirplus(). To verify, I made the > > > > > > > > below > > > > > > > > modification, ran the test for various kernel versions > > > > > > > > and > > > > > > > > captured the results shown below. > > > > > > > > > > > > > > > > The modification is: > > > > > > > > > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > > > > > > > > index a73e2f8bd8ec..5d4a64555fa7 100644 > > > > > > > > > --- a/fs/nfs/dir.c > > > > > > > > > +++ b/fs/nfs/dir.c > > > > > > > > > @@ -444,7 +444,7 @@ void > > > > > > > > > nfs_force_use_readdirplus(struct > > > > > > > > > inode > > > > > > > > > *dir) > > > > > > > > > if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) > > > > > > > > > && > > > > > > > > > !list_empty(&nfsi->open_files)) { > > > > > > > > > set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi- > > > > > > > > > >flags); > > > > > > > > > - invalidate_mapping_pages(dir->i_mapping, 0, > > > > > > > > > -1); > > > > > > > > > + nfs_zap_mapping(dir, dir->i_mapping); > > > > > > > > > } > > > > > > > > > } > > > > > This change is only reverting part of commit 79f687a3de9e. My > > > > > problem > > > > > with that is as follows: > > > > > > > > > > RFC1813 states that NFSv3 READDIRPLUS cookies and verifiers > > > > > must > > > > > match > > > > > those returned by previous READDIRPLUS calls, and READDIR > > > > > cookies > > > > > and > > > > > verifiers must match those returned by previous READDIR > > > > > calls. It > > > > > says > > > > > nothing about being able to assume cookies from READDIR and > > > > > READDIRPLUS > > > > > calls are interchangeable. So the only reason I can see for > > > > > the > > > > > invalidate_mapping_pages() is to ensure that we do separate > > > > > the > > > > > two > > > > > cookie caches. > > > > > > If I understand your concern correctly that in NFSv3 the client > > > must > > > maintain valid cookies and cookie verifiers when switching > > > between > > > READDIR and READDIRPLUS, or vice sersa, then I think the current > > > client > > > code handles this condition ok. > > > > > > On the client, both READDIR and READDIRPLUS requests use the > > > cookie > > > values > > > from the same cached pages of the directory so I don't think they > > > can > > > be > > > out of sync when the client switches between READDIRPLUS and > > > READDIR > > > requests for different nfs_readdir calls. > > > > > > In fact, currently the first nfs_readdir uses READDIRPLUS's to > > > fill > > > read > > > the entries and if there is no LOOKUP/GETATTR on one of the > > > directory > > > entries then the client reverts to READDIR's for subsequent > > > nfs_readdir > > > calls without invalidating any cached pages of the directory. If > > > there > > > are LOOKUP/GETATTR done on one of the directory entries then > > > nfs_advise_use_readdirplus is called which forces the client to > > > use > > > READDIRPLUS again for the next nfs_readdir. > > > > > > Unless the user mounts the export with 'nordirplus' option then > > > the > > > client uses only READDIRs for all requests, no switching takes > > > place. > > > > I don't understand your point. > > The original point was that the directory's page cache seems to > be cleared a little too often (quite apart from switching between > READDIRPLUS and READDIR). > > I think Dai is saying that cache clearing is appropriate to defer > when the directory's mtime has changed but the READ method remains > the same. Otherwise repeatedly adding a new file to a very large > directory that is being read can trigger a situation where the > reading getdents loop never completes. > Fair enough, but the section of code he was touching with his patch should be only about switching between READDIR/PLUS methods. The mtime monitoring happens elsewhere and is already set up to invalidate the cache only on opendir()/rewinddir(). > My two cents Euro. > > > > The issue is that > > nfs_advise_use_readdirplus() can cause the behaviour to switch > > between > > use of READDIRPLUS and use of READDIR from one syscall to > > getdents() to > > the next. > > If the client is using the same page cache, across those syscalls, > > then > > it will end up caching a mixture of cookies. Furthermore, since the > > cookie that is used as an argument to the next call to > > READDIR/READDIRPLUS is taken from that page cache, then we can end > > up > > calling READDIRPLUS with a cookie that came from READDIR and vice > > versa. > > > > As I said, I'm not convinced that is legal in RFC1813 (NFSv3). > > > > That is why we want to clear the page cache when we swap between > > use of > > READDIR and use of READDIRPLUS for the case of NFSv3. > > Just curious, are you aware of an NFSv3 server implementation that > would have a problem with a client that mixes the cookies? > > > > > Thanks, > > > -Dai > > > > > > > > OTOH, for NFSv4, there is no separate READDIRPLUS function, > > > > > so > > > > > there > > > > > really does not appear to be any reason to clear the page > > > > > cache > > > > > at > > > > > all > > > > > as we're switching between requesting attributes or not. > > > > > > > > > Sorry... To spell out my objection to this change more clearly: > > > > The > > > > call to nfs_zap_mapping() makes no sense in either case. > > > > * It defers the cache invalidation until the next call to > > > > rewinddir()/opendir(), so it does not address the NFSv3 > > > > concern. > > > > * It would appear to be entirely superfluous for the NFSv4 > > > > case. > > > > > > > > So a change that might be acceptable would be to keep the > > > > existing > > > > call > > > > to invalidate_mapping_pages() for NFSv3, but to remove it for > > > > NFSv4. > > > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@xxxxxxxxxxxxxxx > > -- > Chuck Lever > > > -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space