On Fri, 2020-01-17 at 18:29 -0800, Dai Ngo wrote: > Hi Trond, > > On 1/15/20 11:06 AM, Trond Myklebust wrote: > > On Wed, 2020-01-15 at 18:54 +0000, Trond Myklebust wrote: > > > On Wed, 2020-01-15 at 10:11 -0800, Dai Ngo wrote: > > > > Hi Anna, Trond, > > > > > > > > Would you please let me know your opinion regarding reverting > > > > the > > > > change in > > > > nfs_force_use_readdirplus to call nfs_zap_mapping instead of > > > > invalidate_mapping_pages. > > > > This change is to prevent the cookie of the READDIRPLUS to be > > > > reset > > > > to 0 while > > > > an instance of 'ls' is running and the directory is being > > > > modified. > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index > > > > > a73e2f8bd8ec..5d4a64555fa7 100644 --- a/fs/nfs/dir.c +++ > > > > > b/fs/nfs/dir.c @@ -444,7 +444,7 @@ void > > > > > nfs_force_use_readdirplus(struct inode *dir) if > > > > > (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > > > > > !list_empty(&nfsi->open_files)) { > > > > > set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); - > > > > > invalidate_mapping_pages(dir->i_mapping, 0, -1); + > > > > > nfs_zap_mapping(dir, dir->i_mapping); } } > > > > Thanks, > > > > -Dai > > > > > > > > On 12/19/19 8:01 PM, Dai Ngo wrote: > > > > > Hi Anna, Trond, > > > > > > > > > > I made a mistake with the 5.5 numbers. The VM that runs 5.5 > > > > > has > > > > > some > > > > > problems. There is no regression with 5.5, here are the new > > > > > numbers: > > > > > > > > > > Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s 197891: > > > > > 10m35.789s > > > > > Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s 192801: > > > > > 3m55.003s > > > > > > > > > > My apologies for the mistake. > > > > > > > > > > Now there is no regression with 5.5, I'd like to get your > > > > > opinion > > > > > regarding the change to revert the call from > > > > > invalidate_mapping_pages > > > > > to nfs_zap_mapping in nfs_force_use_readdirplus to prevent > > > > > the > > > > > current 'ls' from restarting the READDIRPLUS3 from cookie 0. > > > > > I'm > > > > > not quite sure about the intention of the prior change from > > > > > nfs_zap_mapping to invalidate_mapping_pages so that is why > > > > > I'm > > > > > seeking advise. Or do you have any suggestions to achieve the > > > > > same? > > > > > > > > > > Thanks, > > > > > -Dai > > > > > > > > > > On 12/17/19 4:34 PM, Dai Ngo wrote: > > > > > > Hi, > > > > > > > > > > > > I'd like to report an issue with 'ls -lrt' on NFSv3 client > > > > > > takes > > > > > > a very long time to display the content of a large > > > > > > directory > > > > > > (100k - 200k files) while the directory is being modified > > > > > > by > > > > > > another NFSv3 client. > > > > > > > > > > > > The problem can be reproduced using 3 systems. One system > > > > > > serves > > > > > > as the NFS server, one system runs as the client that doing > > > > > > the > > > > > > 'ls -lrt' and another system runs the client that creates > > > > > > files > > > > > > on the server. > > > > > > Client1 creates files using this simple script: > > > > > > > > > > > > > #!/bin/sh > > > > > > > > > > > > > > if [ $# -lt 2 ]; then > > > > > > > echo "Usage: $0 number_of_files base_filename" > > > > > > > exit > > > > > > > fi nfiles=$1 > > > > > > > fname=$2 > > > > > > > echo "creating $nfiles files using filename[$fname]..." > > > > > > > i=0 while [ i -lt $nfiles ] ; > > > > > > > do i=`expr $i + 1` > > > > > > > echo "xyz" > $fname$i > > > > > > > echo "$fname$i" done > > > > > > Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop. > > > > > > > > > > > > The network traces and dtrace probes showed numerous > > > > > > READDIRPLUS3 > > > > > > requests restarting from cookie 0 which seemed to indicate > > > > > > the > > > > > > cached pages of the directory were invalidated causing the > > > > > > pages > > > > > > to be refilled starting from cookie 0 until the current > > > > > > requested > > > > > > cookie. The cached page invalidation were tracked to > > > > > > nfs_force_use_readdirplus(). To verify, I made the below > > > > > > modification, ran the test for various kernel versions and > > > > > > captured the results shown below. > > > > > > > > > > > > The modification is: > > > > > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > > > > > > index a73e2f8bd8ec..5d4a64555fa7 100644 > > > > > > > --- a/fs/nfs/dir.c > > > > > > > +++ b/fs/nfs/dir.c > > > > > > > @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct > > > > > > > inode > > > > > > > *dir) > > > > > > > if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > > > > > > > !list_empty(&nfsi->open_files)) { > > > > > > > set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); > > > > > > > - invalidate_mapping_pages(dir->i_mapping, 0, -1); > > > > > > > + nfs_zap_mapping(dir, dir->i_mapping); > > > > > > > } > > > > > > > } > > > This change is only reverting part of commit 79f687a3de9e. My > > > problem > > > with that is as follows: > > > > > > RFC1813 states that NFSv3 READDIRPLUS cookies and verifiers must > > > match > > > those returned by previous READDIRPLUS calls, and READDIR cookies > > > and > > > verifiers must match those returned by previous READDIR calls. It > > > says > > > nothing about being able to assume cookies from READDIR and > > > READDIRPLUS > > > calls are interchangeable. So the only reason I can see for the > > > invalidate_mapping_pages() is to ensure that we do separate the > > > two > > > cookie caches. > > If I understand your concern correctly that in NFSv3 the client must > maintain valid cookies and cookie verifiers when switching between > READDIR and READDIRPLUS, or vice sersa, then I think the current > client > code handles this condition ok. > > On the client, both READDIR and READDIRPLUS requests use the cookie > values > from the same cached pages of the directory so I don't think they can > be > out of sync when the client switches between READDIRPLUS and READDIR > requests for different nfs_readdir calls. > > In fact, currently the first nfs_readdir uses READDIRPLUS's to fill > read > the entries and if there is no LOOKUP/GETATTR on one of the directory > entries then the client reverts to READDIR's for subsequent > nfs_readdir > calls without invalidating any cached pages of the directory. If > there > are LOOKUP/GETATTR done on one of the directory entries then > nfs_advise_use_readdirplus is called which forces the client to use > READDIRPLUS again for the next nfs_readdir. > > Unless the user mounts the export with 'nordirplus' option then the > client uses only READDIRs for all requests, no switching takes place. I don't understand your point. The issue is that nfs_advise_use_readdirplus() can cause the behaviour to switch between use of READDIRPLUS and use of READDIR from one syscall to getdents() to the next. If the client is using the same page cache, across those syscalls, then it will end up caching a mixture of cookies. Furthermore, since the cookie that is used as an argument to the next call to READDIR/READDIRPLUS is taken from that page cache, then we can end up calling READDIRPLUS with a cookie that came from READDIR and vice versa. As I said, I'm not convinced that is legal in RFC1813 (NFSv3). That is why we want to clear the page cache when we swap between use of READDIR and use of READDIRPLUS for the case of NFSv3. > > Thanks, > -Dai > > > > OTOH, for NFSv4, there is no separate READDIRPLUS function, so > > > there > > > really does not appear to be any reason to clear the page cache > > > at > > > all > > > as we're switching between requesting attributes or not. > > > > > Sorry... To spell out my objection to this change more clearly: The > > call to nfs_zap_mapping() makes no sense in either case. > > * It defers the cache invalidation until the next call to > > rewinddir()/opendir(), so it does not address the NFSv3 > > concern. > > * It would appear to be entirely superfluous for the NFSv4 case. > > > > So a change that might be acceptable would be to keep the existing > > call > > to invalidate_mapping_pages() for NFSv3, but to remove it for > > NFSv4. > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx