On Wed, Aug 24, 2016 at 12:18:04PM +0000, Trond Myklebust wrote: > > > On Aug 23, 2016, at 17:21, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > > > > > > > On 23 Aug 2016, at 11:36, Trond Myklebust wrote: > > > >>> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > >>> > >>> Hi linux-nfs, > >>> > >>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' > >>> situations") changed when nfs_readdir() decides to revalidate the > >>> directory's mapping, which contains all the entries. In addition to just > >>> checking if the attribute cache has expired, it includes a check to see if > >>> NFS_INO_INVALID_DATA is set on the directory. > >>> > >>> Well, customers that have directories with very many dentries and that same > >>> directory's attributes are frequently updated are now grouchy that `ls -l` > >>> takes so long since any update of the directory causes the mapping to be > >>> invalidated and we have to start over filling the directory's mapping. > >>> > >>> I actually haven't put real hard thought into it yet (because often for me > >>> that just wastes a lot of time), so I am doing the lazy thing by asking this > >>> question: > >>> > >>> Can we go back to just the using the attribute cache timeout, or should we > >>> get all heuristical about readdir? > >>> > >> > >> We are all heuristical at this point. How are the heuristics failing? > >> > >> The original problem those heuristics were designed to solve was that all > >> the stat() calls took forever to complete, since they are all synchronous; > >> Tigran showed some very convincing numbers for a large directory where the > >> difference in performance was an order of magnitude improved by using > >> readdirplus instead of readdir… > > > > I'll try to present a better explanation. While `ls -l` is walking through > > a directory repeatedly entering nfs_readdir(), a CREATE response send us > > through nfs_post_op_update_inode_locked(): > > > > 1531 if (S_ISDIR(inode->i_mode)) > > 1532 invalid |= NFS_INO_INVALID_DATA; > > 1533 nfs_set_cache_invalid(inode, invalid); > > > > Now, the next entry into nfs_readdir() has us do nfs_revalidate_mapping(), > > which will do nfs_invalidate_mapping() for the directory, and so we have to > > start over with cookie 0 sending READDIRPLUS to re-fill the directory's > > mapping to get back to where we are for the current nfs_readdir(). > > > > This process repeats for every entry into nfs_readdir() if the directory > > keeps getting updated, and it becomes more likely that it will be updated as > > each pass takes longer and longer to re-fill the mapping as the current > > nfs_readdir() invocation is further along. > > > > READDIRPLUS isn't the problem, the problem is invalidating the directory > > mapping in the middle of a series of getdents() if we do a CREATE. Also, I > > think a similar thing happens if the directory's mtime or ctime is updated - > > but in that case we set NFS_INO_INVALID_DATA because the change_attr > > updates. > > > > So, for directories with a large number of entries that updates often, it can > > be very slow to list the directory. > > > > Why did 311324ad1713 change nfs_readdir from > > > > if (nfs_attribute_cache_expired(inode)) > > nfs_revalidate_mapping(inode, file->f_mapping); > > to > > > > if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & NFS_INO_INVALID_DATA) > > nfs_revalidate_mapping(inode, file->f_mapping); > > As the commit message says, the whole purpose was to use READDIRPLUS as a substitute for multiple GETATTR calls when the heuristic tells us that the user is performing an ‘ls -l’ style of workload. > > > > > .. and can we go back to the way it was before? > > Not without slowing down ‘ls -l’ on large directories. > > > > > OK.. I understand why -- it is more correct since if we know the directory has > > changed, we might as well fetch the change. Otherwise, we might be creating > > files and then wondering why they aren't listed. > > > > It might be nicer to not invalidate the mapping we're currently using for > > readdir, though. Maybe there's a way to keep the mapping for the currently > > opened directory and invalidate it once it's closed. > > POSIX requires that you revalidate on opendir() and rewinddir(), and > leaves behaviour w.r.t. file addition and removal after the call to > opendir()/rewinddir() undefined > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). It is only undefined whether the added or removed entry is returned. Other entries still need to returned exactly once. In this case we're restarting the read of the directory from scratch--I don't understand how that's possible while avoiding skipped or duplicated entries. Surely the only safe thing to do is to continue reading using the last cookie returned from the server. --b. > I believe most filesystems under Linux will ensure that if a file is > removed, it is no longer seen in the readdir stream, and a file that > is added will be seen unless the readdir cursor has already passed > it, so there is that to consider. > > However correctness was not the main issue here. > > N?????r??y????b?X??ǧv?^?){.n?+????{???"??^n?r???z???h?????&???G???h?(?階?ݢj"???m??????z?ޖ???f???h???~?m -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html