On Fri, 2022-02-25 at 13:10 +0000, Trond Myklebust wrote: > On Fri, 2022-02-25 at 06:38 -0500, Benjamin Coddington wrote: > > On 24 Feb 2022, at 22:51, Trond Myklebust wrote: > > > > > On Fri, 2022-02-25 at 02:26 +0000, Trond Myklebust wrote: > > > > On Thu, 2022-02-24 at 09:53 -0500, Benjamin Coddington wrote: > > > > > On 23 Feb 2022, at 16:12, trondmy@xxxxxxxxxx wrote: > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > > > > > Use the change attribute and the first cookie in a > > > > > > directory > > > > > > page > > > > > > cache > > > > > > entry to validate that the page is up to date. > > > > > > > > > > > > Suggested-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > > > > > > Signed-off-by: Trond Myklebust > > > > > > <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > --- > > > > > > fs/nfs/dir.c | 68 > > > > > > ++++++++++++++++++++++++++++------------------------ > > > > > > 1 file changed, 37 insertions(+), 31 deletions(-) > > > > > > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > > > > > index f2258e926df2..5d9367d9b651 100644 > > > > > > --- a/fs/nfs/dir.c > > > > > > +++ b/fs/nfs/dir.c > > > > > > @@ -139,6 +139,7 @@ struct nfs_cache_array_entry { > > > > > > }; > > > > > > > > > > > > struct nfs_cache_array { > > > > > > + u64 change_attr; > > > > > > u64 last_cookie; > > > > > > unsigned int size; > > > > > > unsigned char page_full : 1, > > > > > > @@ -175,7 +176,8 @@ static void > > > > > > nfs_readdir_array_init(struct > > > > > > nfs_cache_array *array) > > > > > > memset(array, 0, sizeof(struct nfs_cache_array)); > > > > > > } > > > > > > > > > > > > -static void nfs_readdir_page_init_array(struct page *page, > > > > > > u64 > > > > > > last_cookie) > > > > > > +static void nfs_readdir_page_init_array(struct page *page, > > > > > > u64 > > > > > > last_cookie, > > > > > > + u64 > > > > > > change_attr) > > > > > > { > > > > > > struct nfs_cache_array *array; > > > > > > > > > > > > > > > There's a hunk missing here, something like: > > > > > > > > > > @@ -185,6 +185,7 @@ static void > > > > > nfs_readdir_page_init_array(struct > > > > > page > > > > > *page, u64 last_cookie, > > > > > nfs_readdir_array_init(array); > > > > > array->last_cookie = last_cookie; > > > > > array->cookies_are_ordered = 1; > > > > > + array->change_attr = change_attr; > > > > > kunmap_atomic(array); > > > > > } > > > > > > > > > > > > > > > > > @@ -207,7 +209,7 @@ nfs_readdir_page_array_alloc(u64 > > > > > > last_cookie, > > > > > > gfp_t gfp_flags) > > > > > > { > > > > > > struct page *page = alloc_page(gfp_flags); > > > > > > if (page) > > > > > > - nfs_readdir_page_init_array(page, > > > > > > last_cookie); > > > > > > + nfs_readdir_page_init_array(page, > > > > > > last_cookie, > > > > > > 0); > > > > > > return page; > > > > > > } > > > > > > > > > > > > @@ -304,19 +306,44 @@ int nfs_readdir_add_to_array(struct > > > > > > nfs_entry > > > > > > *entry, struct page *page) > > > > > > return ret; > > > > > > } > > > > > > > > > > > > +static bool nfs_readdir_page_cookie_match(struct page > > > > > > *page, > > > > > > u64 > > > > > > last_cookie, > > > > > > + > > > > > > u64 change_attr) > > > > > > > > > > How about "nfs_readdir_page_valid()"? There's more going on > > > > > than a > > > > > cookie match. > > > > > > > > > > > > > > > > +{ > > > > > > + struct nfs_cache_array *array = kmap_atomic(page); > > > > > > + int ret = true; > > > > > > + > > > > > > + if (array->change_attr != change_attr) > > > > > > + ret = false; > > > > > > > > > > Can we skip the next test if ret = false? > > > > > > > > I'd expect the compiler to do that. > > > > > > > > > > > > > > > + if (array->size > 0 && array->array[0].cookie != > > > > > > last_cookie) > > > > > > + ret = false; > > > > > > + kunmap_atomic(array); > > > > > > + return ret; > > > > > > +} > > > > > > + > > > > > > +static void nfs_readdir_page_unlock_and_put(struct page > > > > > > *page) > > > > > > +{ > > > > > > + unlock_page(page); > > > > > > + put_page(page); > > > > > > +} > > > > > > + > > > > > > static struct page *nfs_readdir_page_get_locked(struct > > > > > > address_space > > > > > > *mapping, > > > > > > pgoff_t > > > > > > index, > > > > > > u64 > > > > > > last_cookie) > > > > > > { > > > > > > struct page *page; > > > > > > + u64 change_attr; > > > > > > > > > > > > page = grab_cache_page(mapping, index); > > > > > > - if (page && !PageUptodate(page)) { > > > > > > - nfs_readdir_page_init_array(page, > > > > > > last_cookie); > > > > > > - if > > > > > > (invalidate_inode_pages2_range(mapping, index > > > > > > + > > > > > > 1, -1) < 0) > > > > > > - nfs_zap_mapping(mapping->host, > > > > > > mapping); > > > > > > - SetPageUptodate(page); > > > > > > + if (!page) > > > > > > + return NULL; > > > > > > + change_attr = > > > > > > inode_peek_iversion_raw(mapping->host); > > > > > > + if (PageUptodate(page)) { > > > > > > + if > > > > > > (nfs_readdir_page_cookie_match(page, > > > > > > last_cookie, > > > > > > + > > > > > > change_attr)) > > > > > > + return page; > > > > > > + nfs_readdir_clear_array(page); > > > > > > > > > > > > > > > Why use i_version rather than nfs_save_change_attribute? > > > > > Seems > > > > > having a > > > > > consistent value across the pachecache and dir_verifiers > > > > > would > > > > > help > > > > > debugging, and we've already have a bunch of machinery around > > > > > the > > > > > change_attribute. > > > > > > > > The directory cache_change_attribute is not reported in > > > > tracepoints > > > > because it is a directory-specific field, so it's not as useful > > > > for > > > > debugging. > > > > > > > > The inode change attribute is what we have traditionally used > > > > for > > > > determining cache consistency, and when to invalidate the > > > > cache. > > > > > > I should probably elaborate a little further on the differences > > > between > > > the inode change attribute and the cache_change_attribute. > > > > > > One of the main reasons for introducing the latter was to have > > > something that allows us to track changes to the directory, but > > > to > > > avoid forcing unnecessary revalidations of the dcache. > > > > > > What this means is that when we create or remove a file, and the > > > pre/post-op attributes tell us that there were no third party > > > changes > > > to the directory, we update the dcache, but we do _not_ update > > > the > > > cache_change_attribute, because we know that the rest of the > > > directory > > > contents are valid, and so we don't have to revalidate the > > > dentries. > > > However in that case, we _do_ want to update the readdir cache to > > > reflect the fact that an entry was added or deleted. While we > > > could > > > figure out how to remove an entry (at least for the case where > > > the > > > filesystem is case-sensitive), we do not know where the > > > filesystem > > > added the new file, or what cookies was assigned. > > > > > > This is why the inode change attribute is more appropriate for > > > indexing > > > the page cache pages. It reflects the cases where we want to > > > revalidate > > > the readdir cache, as opposed to the dcache. > > > > Ok, thanks for explaining this. > > > > I've noticed that you haven't responded about my concerns about not > > checking > > the directory for changes with every v4 READDIR. For v3, we have > > post-op > > updates to the directory, but with v4 the directory can change and > > we'll > > end up with entries in the cache that are marked with an old > > change_attr. > > > > Then they will be rejected by nfs_readdir_page_cookie_match() if a > user > looks up that page again after we've revalidated the change attribute > on the directory. > > ...and note that NFSv4 does returns a struct change_info4 for all > operations that change the directory, so we will update the change > attribute in all those cases. > > If the change is made on the server, well then we will detect it > through the standard revalidation process that usually decides when > to > invalidate the directory page cache. > > > I'm pretty positive that not checking for changes to the directory > > (not > > sending GETATTR with READDIR) is going to create cases of double- > > listed > > and > > truncated-listings for dirctory listers. Not handling those cases > > means > > I'm > > going to have some very unhappy customers that complain about their > > files > > disappearing/reappearing on NFS. > > > > If you need me to prove that its an issue, I can take the time to > > write > > up > > program that shows this problem. > > > > If you label the page contents with an attribute that was retrieved > _after_ the READDIR op, then you will introduce this as a problem for > your customers. > > The reason is that there is no atomicity between operations in a > COMPOUND. Worse, the implementation of readdir in scalable modern > systems, including Linux, does not even guarantee atomicity of the > readdir operation itself. Instead each readdir entry is filled > without > holding any locks or preventing any changes to the directory or to > the > object itself. > > POSIX states very explicitly that if you're making changes to the > directory after the call to opendir() or rewinddir(), then the > behaviour w.r.t. whether that file appears in the readdir() call is > unspecified. See > https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html > > This is also consistent with how glibc caches the results of a > getdents() call. > Ah, wait a minute... There is a problem with the call to nfs_readdir_page_get_next(). It will allocate the page _after_ the readdir call itself, and so might label it with a newer change attribute... I'll fix that so we can pass in the change attribute associated with the readdir call. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx