> On Aug 23, 2016, at 17:21, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > > > On 23 Aug 2016, at 11:36, Trond Myklebust wrote: > >>> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> Hi linux-nfs, >>> >>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' >>> situations") changed when nfs_readdir() decides to revalidate the >>> directory's mapping, which contains all the entries. In addition to just >>> checking if the attribute cache has expired, it includes a check to see if >>> NFS_INO_INVALID_DATA is set on the directory. >>> >>> Well, customers that have directories with very many dentries and that same >>> directory's attributes are frequently updated are now grouchy that `ls -l` >>> takes so long since any update of the directory causes the mapping to be >>> invalidated and we have to start over filling the directory's mapping. >>> >>> I actually haven't put real hard thought into it yet (because often for me >>> that just wastes a lot of time), so I am doing the lazy thing by asking this >>> question: >>> >>> Can we go back to just the using the attribute cache timeout, or should we >>> get all heuristical about readdir? >>> >> >> We are all heuristical at this point. How are the heuristics failing? >> >> The original problem those heuristics were designed to solve was that all >> the stat() calls took forever to complete, since they are all synchronous; >> Tigran showed some very convincing numbers for a large directory where the >> difference in performance was an order of magnitude improved by using >> readdirplus instead of readdir… > > I'll try to present a better explanation. While `ls -l` is walking through > a directory repeatedly entering nfs_readdir(), a CREATE response send us > through nfs_post_op_update_inode_locked(): > > 1531 if (S_ISDIR(inode->i_mode)) > 1532 invalid |= NFS_INO_INVALID_DATA; > 1533 nfs_set_cache_invalid(inode, invalid); > > Now, the next entry into nfs_readdir() has us do nfs_revalidate_mapping(), > which will do nfs_invalidate_mapping() for the directory, and so we have to > start over with cookie 0 sending READDIRPLUS to re-fill the directory's > mapping to get back to where we are for the current nfs_readdir(). > > This process repeats for every entry into nfs_readdir() if the directory > keeps getting updated, and it becomes more likely that it will be updated as > each pass takes longer and longer to re-fill the mapping as the current > nfs_readdir() invocation is further along. > > READDIRPLUS isn't the problem, the problem is invalidating the directory > mapping in the middle of a series of getdents() if we do a CREATE. Also, I > think a similar thing happens if the directory's mtime or ctime is updated - > but in that case we set NFS_INO_INVALID_DATA because the change_attr > updates. > > So, for directories with a large number of entries that updates often, it can > be very slow to list the directory. > > Why did 311324ad1713 change nfs_readdir from > > if (nfs_attribute_cache_expired(inode)) > nfs_revalidate_mapping(inode, file->f_mapping); > to > > if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & NFS_INO_INVALID_DATA) > nfs_revalidate_mapping(inode, file->f_mapping); As the commit message says, the whole purpose was to use READDIRPLUS as a substitute for multiple GETATTR calls when the heuristic tells us that the user is performing an ‘ls -l’ style of workload. > > .. and can we go back to the way it was before? Not without slowing down ‘ls -l’ on large directories. > > OK.. I understand why -- it is more correct since if we know the directory has > changed, we might as well fetch the change. Otherwise, we might be creating > files and then wondering why they aren't listed. > > It might be nicer to not invalidate the mapping we're currently using for > readdir, though. Maybe there's a way to keep the mapping for the currently > opened directory and invalidate it once it's closed. POSIX requires that you revalidate on opendir() and rewinddir(), and leaves behaviour w.r.t. file addition and removal after the call to opendir()/rewinddir() undefined (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). I believe most filesystems under Linux will ensure that if a file is removed, it is no longer seen in the readdir stream, and a file that is added will be seen unless the readdir cursor has already passed it, so there is that to consider. However correctness was not the main issue here. ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥