Re: [PATCH v7 05/21] NFS: Store the change attribute in the directory page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-02-25 at 13:10 +0000, Trond Myklebust wrote:
> On Fri, 2022-02-25 at 06:38 -0500, Benjamin Coddington wrote:
> > On 24 Feb 2022, at 22:51, Trond Myklebust wrote:
> > 
> > > On Fri, 2022-02-25 at 02:26 +0000, Trond Myklebust wrote:
> > > > On Thu, 2022-02-24 at 09:53 -0500, Benjamin Coddington wrote:
> > > > > On 23 Feb 2022, at 16:12, trondmy@xxxxxxxxxx wrote:
> > > > > 
> > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > > 
> > > > > > Use the change attribute and the first cookie in a
> > > > > > directory
> > > > > > page
> > > > > > cache
> > > > > > entry to validate that the page is up to date.
> > > > > > 
> > > > > > Suggested-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> > > > > > Signed-off-by: Trond Myklebust
> > > > > > <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > >  fs/nfs/dir.c | 68
> > > > > > ++++++++++++++++++++++++++++------------------------
> > > > > >  1 file changed, 37 insertions(+), 31 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > > index f2258e926df2..5d9367d9b651 100644
> > > > > > --- a/fs/nfs/dir.c
> > > > > > +++ b/fs/nfs/dir.c
> > > > > > @@ -139,6 +139,7 @@ struct nfs_cache_array_entry {
> > > > > >  };
> > > > > > 
> > > > > >  struct nfs_cache_array {
> > > > > > +       u64 change_attr;
> > > > > >         u64 last_cookie;
> > > > > >         unsigned int size;
> > > > > >         unsigned char page_full : 1,
> > > > > > @@ -175,7 +176,8 @@ static void
> > > > > > nfs_readdir_array_init(struct
> > > > > > nfs_cache_array *array)
> > > > > >         memset(array, 0, sizeof(struct nfs_cache_array));
> > > > > >  }
> > > > > > 
> > > > > > -static void nfs_readdir_page_init_array(struct page *page,
> > > > > > u64
> > > > > > last_cookie)
> > > > > > +static void nfs_readdir_page_init_array(struct page *page,
> > > > > > u64
> > > > > > last_cookie,
> > > > > > +                                       u64 
> > > > > > change_attr)
> > > > > >  {
> > > > > >         struct nfs_cache_array *array;
> > > > > 
> > > > > 
> > > > > There's a hunk missing here, something like:
> > > > > 
> > > > > @@ -185,6 +185,7 @@ static void
> > > > > nfs_readdir_page_init_array(struct
> > > > > page
> > > > > *page, u64 last_cookie,
> > > > >          nfs_readdir_array_init(array);
> > > > >          array->last_cookie = last_cookie;
> > > > >          array->cookies_are_ordered = 1;
> > > > > +       array->change_attr = change_attr;
> > > > >          kunmap_atomic(array);
> > > > >   }
> > > > > 
> > > > > > 
> > > > > > @@ -207,7 +209,7 @@ nfs_readdir_page_array_alloc(u64
> > > > > > last_cookie,
> > > > > > gfp_t gfp_flags)
> > > > > >  {
> > > > > >         struct page *page = alloc_page(gfp_flags);
> > > > > >         if (page)
> > > > > > -               nfs_readdir_page_init_array(page, 
> > > > > > last_cookie);
> > > > > > +               nfs_readdir_page_init_array(page, 
> > > > > > last_cookie,
> > > > > > 0);
> > > > > >         return page;
> > > > > >  }
> > > > > > 
> > > > > > @@ -304,19 +306,44 @@ int nfs_readdir_add_to_array(struct
> > > > > > nfs_entry
> > > > > > *entry, struct page *page)
> > > > > >         return ret;
> > > > > >  }
> > > > > > 
> > > > > > +static bool nfs_readdir_page_cookie_match(struct page
> > > > > > *page,
> > > > > > u64
> > > > > > last_cookie,
> > > > > > +                                         
> > > > > > u64 change_attr)
> > > > > 
> > > > > How about "nfs_readdir_page_valid()"?  There's more going on
> > > > > than a
> > > > > cookie match.
> > > > > 
> > > > > 
> > > > > > +{
> > > > > > +       struct nfs_cache_array *array = kmap_atomic(page);
> > > > > > +       int ret = true;
> > > > > > +
> > > > > > +       if (array->change_attr != change_attr)
> > > > > > +               ret = false;
> > > > > 
> > > > > Can we skip the next test if ret = false?
> > > > 
> > > > I'd expect the compiler to do that.
> > > > 
> > > > > 
> > > > > > +       if (array->size > 0 && array->array[0].cookie !=
> > > > > > last_cookie)
> > > > > > +               ret = false;
> > > > > > +       kunmap_atomic(array);
> > > > > > +       return ret;
> > > > > > +}
> > > > > > +
> > > > > > +static void nfs_readdir_page_unlock_and_put(struct page
> > > > > > *page)
> > > > > > +{
> > > > > > +       unlock_page(page);
> > > > > > +       put_page(page);
> > > > > > +}
> > > > > > +
> > > > > >  static struct page *nfs_readdir_page_get_locked(struct
> > > > > > address_space
> > > > > > *mapping,
> > > > > >                                                 pgoff_t 
> > > > > > index,
> > > > > > u64
> > > > > > last_cookie)
> > > > > >  {
> > > > > >         struct page *page;
> > > > > > +       u64 change_attr;
> > > > > > 
> > > > > >         page = grab_cache_page(mapping, index);
> > > > > > -       if (page && !PageUptodate(page)) {
> > > > > > -               nfs_readdir_page_init_array(page, 
> > > > > > last_cookie);
> > > > > > -               if 
> > > > > > (invalidate_inode_pages2_range(mapping, index
> > > > > > +
> > > > > > 1, -1) < 0)
> > > > > > -                       nfs_zap_mapping(mapping->host, 
> > > > > > mapping);
> > > > > > -               SetPageUptodate(page);
> > > > > > +       if (!page)
> > > > > > +               return NULL;
> > > > > > +       change_attr = 
> > > > > > inode_peek_iversion_raw(mapping->host);
> > > > > > +       if (PageUptodate(page)) {
> > > > > > +               if 
> > > > > > (nfs_readdir_page_cookie_match(page,
> > > > > > last_cookie,
> > > > > > +                                                 
> > > > > > change_attr))
> > > > > > +                       return page;
> > > > > > +               nfs_readdir_clear_array(page);
> > > > > 
> > > > > 
> > > > > Why use i_version rather than nfs_save_change_attribute? 
> > > > > Seems
> > > > > having a
> > > > > consistent value across the pachecache and dir_verifiers
> > > > > would
> > > > > help
> > > > > debugging, and we've already have a bunch of machinery around
> > > > > the
> > > > > change_attribute.
> > > > 
> > > > The directory cache_change_attribute is not reported in
> > > > tracepoints
> > > > because it is a directory-specific field, so it's not as useful
> > > > for
> > > > debugging.
> > > > 
> > > > The inode change attribute is what we have traditionally used
> > > > for
> > > > determining cache consistency, and when to invalidate the
> > > > cache.
> > > 
> > > I should probably elaborate a little further on the differences 
> > > between
> > > the inode change attribute and the cache_change_attribute.
> > > 
> > > One of the main reasons for introducing the latter was to have
> > > something that allows us to track changes to the directory, but
> > > to
> > > avoid forcing unnecessary revalidations of the dcache.
> > > 
> > > What this means is that when we create or remove a file, and the
> > > pre/post-op attributes tell us that there were no third party
> > > changes
> > > to the directory, we update the dcache, but we do _not_ update
> > > the
> > > cache_change_attribute, because we know that the rest of the
> > > directory
> > > contents are valid, and so we don't have to revalidate the
> > > dentries.
> > > However in that case, we _do_ want to update the readdir cache to
> > > reflect the fact that an entry was added or deleted. While we
> > > could
> > > figure out how to remove an entry (at least for the case where
> > > the
> > > filesystem is case-sensitive), we do not know where the
> > > filesystem
> > > added the new file, or what cookies was assigned.
> > > 
> > > This is why the inode change attribute is more appropriate for 
> > > indexing
> > > the page cache pages. It reflects the cases where we want to 
> > > revalidate
> > > the readdir cache, as opposed to the dcache.
> > 
> > Ok, thanks for explaining this.
> > 
> > I've noticed that you haven't responded about my concerns about not
> > checking
> > the directory for changes with every v4 READDIR.  For v3, we have 
> > post-op
> > updates to the directory, but with v4 the directory can change and
> > we'll
> > end up with entries in the cache that are marked with an old 
> > change_attr.
> > 
> 
> Then they will be rejected by nfs_readdir_page_cookie_match() if a
> user
> looks up that page again after we've revalidated the change attribute
> on the directory.
> 
> ...and note that NFSv4 does returns a struct change_info4 for all
> operations that change the directory, so we will update the change
> attribute in all those cases.
> 
> If the change is made on the server, well then we will detect it
> through the standard revalidation process that usually decides when
> to
> invalidate the directory page cache.
> 
> > I'm pretty positive that not checking for changes to the directory
> > (not
> > sending GETATTR with READDIR) is going to create cases of double-
> > listed 
> > and
> > truncated-listings for dirctory listers.  Not handling those cases
> > means 
> > I'm
> > going to have some very unhappy customers that complain about their
> > files
> > disappearing/reappearing on NFS.
> > 
> > If you need me to prove that its an issue, I can take the time to
> > write 
> > up
> > program that shows this problem.
> > 
> 
> If you label the page contents with an attribute that was retrieved
> _after_ the READDIR op, then you will introduce this as a problem for
> your customers.
> 
> The reason is that there is no atomicity between operations in a
> COMPOUND. Worse, the implementation of readdir in scalable modern
> systems, including Linux, does not even guarantee atomicity of the
> readdir operation itself. Instead each readdir entry is filled
> without
> holding any locks or preventing any changes to the directory or to
> the
> object itself.
> 
> POSIX states very explicitly that if you're making changes to the
> directory after the call to opendir() or rewinddir(), then the
> behaviour w.r.t. whether that file appears in the readdir() call is
> unspecified. See
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
> 
> This is also consistent with how glibc caches the results of a
> getdents() call.
> 

Ah, wait a minute...

There is a problem with the call to nfs_readdir_page_get_next(). It
will allocate the page _after_ the readdir call itself, and so might
label it with a newer change attribute... I'll fix that so we can pass
in the change attribute associated with the readdir call.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux