On Fri, 2022-02-25 at 06:38 -0500, Benjamin Coddington wrote: > On 24 Feb 2022, at 22:51, Trond Myklebust wrote: > > > On Fri, 2022-02-25 at 02:26 +0000, Trond Myklebust wrote: > > > On Thu, 2022-02-24 at 09:53 -0500, Benjamin Coddington wrote: > > > > On 23 Feb 2022, at 16:12, trondmy@xxxxxxxxxx wrote: > > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > > > Use the change attribute and the first cookie in a directory > > > > > page > > > > > cache > > > > > entry to validate that the page is up to date. > > > > > > > > > > Suggested-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > > > > > Signed-off-by: Trond Myklebust > > > > > <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > --- > > > > > fs/nfs/dir.c | 68 > > > > > ++++++++++++++++++++++++++++------------------------ > > > > > 1 file changed, 37 insertions(+), 31 deletions(-) > > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > > > > index f2258e926df2..5d9367d9b651 100644 > > > > > --- a/fs/nfs/dir.c > > > > > +++ b/fs/nfs/dir.c > > > > > @@ -139,6 +139,7 @@ struct nfs_cache_array_entry { > > > > > }; > > > > > > > > > > struct nfs_cache_array { > > > > > + u64 change_attr; > > > > > u64 last_cookie; > > > > > unsigned int size; > > > > > unsigned char page_full : 1, > > > > > @@ -175,7 +176,8 @@ static void nfs_readdir_array_init(struct > > > > > nfs_cache_array *array) > > > > > memset(array, 0, sizeof(struct nfs_cache_array)); > > > > > } > > > > > > > > > > -static void nfs_readdir_page_init_array(struct page *page, > > > > > u64 > > > > > last_cookie) > > > > > +static void nfs_readdir_page_init_array(struct page *page, > > > > > u64 > > > > > last_cookie, > > > > > + u64 > > > > > change_attr) > > > > > { > > > > > struct nfs_cache_array *array; > > > > > > > > > > > > There's a hunk missing here, something like: > > > > > > > > @@ -185,6 +185,7 @@ static void > > > > nfs_readdir_page_init_array(struct > > > > page > > > > *page, u64 last_cookie, > > > > nfs_readdir_array_init(array); > > > > array->last_cookie = last_cookie; > > > > array->cookies_are_ordered = 1; > > > > + array->change_attr = change_attr; > > > > kunmap_atomic(array); > > > > } > > > > > > > > > > > > > > @@ -207,7 +209,7 @@ nfs_readdir_page_array_alloc(u64 > > > > > last_cookie, > > > > > gfp_t gfp_flags) > > > > > { > > > > > struct page *page = alloc_page(gfp_flags); > > > > > if (page) > > > > > - nfs_readdir_page_init_array(page, > > > > > last_cookie); > > > > > + nfs_readdir_page_init_array(page, > > > > > last_cookie, > > > > > 0); > > > > > return page; > > > > > } > > > > > > > > > > @@ -304,19 +306,44 @@ int nfs_readdir_add_to_array(struct > > > > > nfs_entry > > > > > *entry, struct page *page) > > > > > return ret; > > > > > } > > > > > > > > > > +static bool nfs_readdir_page_cookie_match(struct page *page, > > > > > u64 > > > > > last_cookie, > > > > > + > > > > > u64 change_attr) > > > > > > > > How about "nfs_readdir_page_valid()"? There's more going on > > > > than a > > > > cookie match. > > > > > > > > > > > > > +{ > > > > > + struct nfs_cache_array *array = kmap_atomic(page); > > > > > + int ret = true; > > > > > + > > > > > + if (array->change_attr != change_attr) > > > > > + ret = false; > > > > > > > > Can we skip the next test if ret = false? > > > > > > I'd expect the compiler to do that. > > > > > > > > > > > > + if (array->size > 0 && array->array[0].cookie != > > > > > last_cookie) > > > > > + ret = false; > > > > > + kunmap_atomic(array); > > > > > + return ret; > > > > > +} > > > > > + > > > > > +static void nfs_readdir_page_unlock_and_put(struct page > > > > > *page) > > > > > +{ > > > > > + unlock_page(page); > > > > > + put_page(page); > > > > > +} > > > > > + > > > > > static struct page *nfs_readdir_page_get_locked(struct > > > > > address_space > > > > > *mapping, > > > > > pgoff_t > > > > > index, > > > > > u64 > > > > > last_cookie) > > > > > { > > > > > struct page *page; > > > > > + u64 change_attr; > > > > > > > > > > page = grab_cache_page(mapping, index); > > > > > - if (page && !PageUptodate(page)) { > > > > > - nfs_readdir_page_init_array(page, > > > > > last_cookie); > > > > > - if > > > > > (invalidate_inode_pages2_range(mapping, index > > > > > + > > > > > 1, -1) < 0) > > > > > - nfs_zap_mapping(mapping->host, > > > > > mapping); > > > > > - SetPageUptodate(page); > > > > > + if (!page) > > > > > + return NULL; > > > > > + change_attr = > > > > > inode_peek_iversion_raw(mapping->host); > > > > > + if (PageUptodate(page)) { > > > > > + if > > > > > (nfs_readdir_page_cookie_match(page, > > > > > last_cookie, > > > > > + > > > > > change_attr)) > > > > > + return page; > > > > > + nfs_readdir_clear_array(page); > > > > > > > > > > > > Why use i_version rather than nfs_save_change_attribute? Seems > > > > having a > > > > consistent value across the pachecache and dir_verifiers would > > > > help > > > > debugging, and we've already have a bunch of machinery around > > > > the > > > > change_attribute. > > > > > > The directory cache_change_attribute is not reported in > > > tracepoints > > > because it is a directory-specific field, so it's not as useful > > > for > > > debugging. > > > > > > The inode change attribute is what we have traditionally used for > > > determining cache consistency, and when to invalidate the cache. > > > > I should probably elaborate a little further on the differences > > between > > the inode change attribute and the cache_change_attribute. > > > > One of the main reasons for introducing the latter was to have > > something that allows us to track changes to the directory, but to > > avoid forcing unnecessary revalidations of the dcache. > > > > What this means is that when we create or remove a file, and the > > pre/post-op attributes tell us that there were no third party > > changes > > to the directory, we update the dcache, but we do _not_ update the > > cache_change_attribute, because we know that the rest of the > > directory > > contents are valid, and so we don't have to revalidate the > > dentries. > > However in that case, we _do_ want to update the readdir cache to > > reflect the fact that an entry was added or deleted. While we could > > figure out how to remove an entry (at least for the case where the > > filesystem is case-sensitive), we do not know where the filesystem > > added the new file, or what cookies was assigned. > > > > This is why the inode change attribute is more appropriate for > > indexing > > the page cache pages. It reflects the cases where we want to > > revalidate > > the readdir cache, as opposed to the dcache. > > Ok, thanks for explaining this. > > I've noticed that you haven't responded about my concerns about not > checking > the directory for changes with every v4 READDIR. For v3, we have > post-op > updates to the directory, but with v4 the directory can change and > we'll > end up with entries in the cache that are marked with an old > change_attr. > Then they will be rejected by nfs_readdir_page_cookie_match() if a user looks up that page again after we've revalidated the change attribute on the directory. ...and note that NFSv4 does returns a struct change_info4 for all operations that change the directory, so we will update the change attribute in all those cases. If the change is made on the server, well then we will detect it through the standard revalidation process that usually decides when to invalidate the directory page cache. > I'm pretty positive that not checking for changes to the directory > (not > sending GETATTR with READDIR) is going to create cases of double- > listed > and > truncated-listings for dirctory listers. Not handling those cases > means > I'm > going to have some very unhappy customers that complain about their > files > disappearing/reappearing on NFS. > > If you need me to prove that its an issue, I can take the time to > write > up > program that shows this problem. > If you label the page contents with an attribute that was retrieved _after_ the READDIR op, then you will introduce this as a problem for your customers. The reason is that there is no atomicity between operations in a COMPOUND. Worse, the implementation of readdir in scalable modern systems, including Linux, does not even guarantee atomicity of the readdir operation itself. Instead each readdir entry is filled without holding any locks or preventing any changes to the directory or to the object itself. POSIX states very explicitly that if you're making changes to the directory after the call to opendir() or rewinddir(), then the behaviour w.r.t. whether that file appears in the readdir() call is unspecified. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html This is also consistent with how glibc caches the results of a getdents() call. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx