Re: [PATCH v7 05/21] NFS: Store the change attribute in the directory page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 Feb 2022, at 10:18, Trond Myklebust wrote:

On Fri, 2022-02-25 at 09:44 -0500, Benjamin Coddington wrote:
On 25 Feb 2022, at 8:10, Trond Myklebust wrote:

On Fri, 2022-02-25 at 06:38 -0500, Benjamin Coddington wrote:
On 24 Feb 2022, at 22:51, Trond Myklebust wrote:

On Fri, 2022-02-25 at 02:26 +0000, Trond Myklebust wrote:
On Thu, 2022-02-24 at 09:53 -0500, Benjamin Coddington wrote:
On 23 Feb 2022, at 16:12, trondmy@xxxxxxxxxx wrote:

From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>

Use the change attribute and the first cookie in a
directory
page
cache
entry to validate that the page is up to date.

Suggested-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
Signed-off-by: Trond Myklebust
<trond.myklebust@xxxxxxxxxxxxxxx>
---
 fs/nfs/dir.c | 68
++++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f2258e926df2..5d9367d9b651 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -139,6 +139,7 @@ struct nfs_cache_array_entry {
 };

 struct nfs_cache_array {
+       u64 change_attr;
        u64 last_cookie;
        unsigned int size;
        unsigned char page_full : 1,
@@ -175,7 +176,8 @@ static void
nfs_readdir_array_init(struct
nfs_cache_array *array)
        memset(array, 0, sizeof(struct nfs_cache_array));
 }

-static void nfs_readdir_page_init_array(struct page
*page,
u64
last_cookie)
+static void nfs_readdir_page_init_array(struct page
*page,
u64
last_cookie,
+                                       u64
change_attr)
 {
        struct nfs_cache_array *array;


There's a hunk missing here, something like:

@@ -185,6 +185,7 @@ static void
nfs_readdir_page_init_array(struct
page
*page, u64 last_cookie,
         nfs_readdir_array_init(array);
         array->last_cookie = last_cookie;
         array->cookies_are_ordered = 1;
+       array->change_attr = change_attr;
         kunmap_atomic(array);
  }


@@ -207,7 +209,7 @@ nfs_readdir_page_array_alloc(u64
last_cookie,
gfp_t gfp_flags)
 {
        struct page *page = alloc_page(gfp_flags);
        if (page)
-               nfs_readdir_page_init_array(page,
last_cookie);
+               nfs_readdir_page_init_array(page,
last_cookie,
0);
        return page;
 }

@@ -304,19 +306,44 @@ int nfs_readdir_add_to_array(struct
nfs_entry
*entry, struct page *page)
        return ret;
 }

+static bool nfs_readdir_page_cookie_match(struct page
*page,
u64
last_cookie,
+                                        
u64 change_attr)

How about "nfs_readdir_page_valid()"?  There's more going
on
than a
cookie match.


+{
+       struct nfs_cache_array *array =
kmap_atomic(page);
+       int ret = true;
+
+       if (array->change_attr != change_attr)
+               ret = false;

Can we skip the next test if ret = false?

I'd expect the compiler to do that.


+       if (array->size > 0 && array->array[0].cookie !=
last_cookie)
+               ret = false;
+       kunmap_atomic(array);
+       return ret;
+}
+
+static void nfs_readdir_page_unlock_and_put(struct page
*page)
+{
+       unlock_page(page);
+       put_page(page);
+}
+
 static struct page *nfs_readdir_page_get_locked(struct
address_space
*mapping,
                                                pgoff_t
index,
u64
last_cookie)
 {
        struct page *page;
+       u64 change_attr;

        page = grab_cache_page(mapping, index);
-       if (page && !PageUptodate(page)) {
-               nfs_readdir_page_init_array(page,
last_cookie);
-               if
(invalidate_inode_pages2_range(mapping, index
+
1, -1) < 0)
-                       nfs_zap_mapping(mapping->host,
mapping);
-               SetPageUptodate(page);
+       if (!page)
+               return NULL;
+       change_attr =
inode_peek_iversion_raw(mapping->host);
+       if (PageUptodate(page)) {
+               if
(nfs_readdir_page_cookie_match(page,
last_cookie,
+                                                
change_attr))
+                       return page;
+               nfs_readdir_clear_array(page);


Why use i_version rather than nfs_save_change_attribute? 
Seems
having a
consistent value across the pachecache and dir_verifiers
would
help
debugging, and we've already have a bunch of machinery
around
the
change_attribute.

The directory cache_change_attribute is not reported in
tracepoints
because it is a directory-specific field, so it's not as
useful
for
debugging.

The inode change attribute is what we have traditionally used
for
determining cache consistency, and when to invalidate the
cache.

I should probably elaborate a little further on the differences
between
the inode change attribute and the cache_change_attribute.

One of the main reasons for introducing the latter was to have
something that allows us to track changes to the directory, but
to
avoid forcing unnecessary revalidations of the dcache.

What this means is that when we create or remove a file, and
the
pre/post-op attributes tell us that there were no third party
changes
to the directory, we update the dcache, but we do _not_ update
the
cache_change_attribute, because we know that the rest of the
directory
contents are valid, and so we don't have to revalidate the
dentries.
However in that case, we _do_ want to update the readdir cache
to
reflect the fact that an entry was added or deleted. While we
could
figure out how to remove an entry (at least for the case where
the
filesystem is case-sensitive), we do not know where the
filesystem
added the new file, or what cookies was assigned.

This is why the inode change attribute is more appropriate for
indexing
the page cache pages. It reflects the cases where we want to
revalidate
the readdir cache, as opposed to the dcache.

Ok, thanks for explaining this.

I've noticed that you haven't responded about my concerns about
not
checking
the directory for changes with every v4 READDIR.  For v3, we have
post-op
updates to the directory, but with v4 the directory can change
and
we'll
end up with entries in the cache that are marked with an old
change_attr.


Then they will be rejected by nfs_readdir_page_cookie_match() if a
user
looks up that page again after we've revalidated the change
attribute
on the directory.

...and note that NFSv4 does returns a struct change_info4 for all
operations that change the directory, so we will update the change
attribute in all those cases.

I'm not worried about changes from the same client.

If the change is made on the server, well then we will detect it
through the standard revalidation process that usually decides when
to
invalidate the directory page cache.

The environments I'm concerned about are setup very frequently: they
look
like multiple NFS clients co-ordinating on a directory with millions
of
files.  Some clients are adding files as they do work, other clients
are
then looking for those files by walking the directory entries to
validate
their existence.  The systems that do this have a "very bad time" if
some
of them produce listings that are _dramatically_ and transiently
different
from a listing they produced before.

That can happen really easily with what we've got here, and it can
create a
huge problem for these setups.  And it won't be easily reproduceable,
and it
will be hard to find.  It will cost everyone involved a lot of time
and
effort to track down, and we can fix it easily.

I'm pretty positive that not checking for changes to the
directory
(not
sending GETATTR with READDIR) is going to create cases of double-
listed
and
truncated-listings for dirctory listers.  Not handling those
cases
means
I'm
going to have some very unhappy customers that complain about
their
files
disappearing/reappearing on NFS.

If you need me to prove that its an issue, I can take the time to
write
up
program that shows this problem.


If you label the page contents with an attribute that was retrieved
_after_ the READDIR op, then you will introduce this as a problem
for
your customers.

No the problem is already here, we're not introducing it.  By
labeling
the
page contents with every call we're shifting the race window from the
client
where it's a very large window to the server where the window is
small.

Its still possible, but *much* less likely.

The reason is that there is no atomicity between operations in a
COMPOUND. Worse, the implementation of readdir in scalable modern
systems, including Linux, does not even guarantee atomicity of the
readdir operation itself. Instead each readdir entry is filled
without
holding any locks or preventing any changes to the directory or to
the
object itself.

I understand all this, but its not a reason to make the problem
worse.

POSIX states very explicitly that if you're making changes to the
directory after the call to opendir() or rewinddir(), then the
behaviour w.r.t. whether that file appears in the readdir() call is
unspecified. See
https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html

Yes, but again - just because the problem exists doesn't give us
reason
to
amplify it when we can easily make a better choice for almost no
cost.

Here are my reasons for wanting the GETATTR added:
  - it makes it *much* less likely for this problem to occur, with
the
minor
    downside of decreased caching for unstable directories.
  - it makes v3 and v4 readdir pagecache behavior consistent WRT
changing
    directories.

I spent a non-trivial amount of time working on this problem, and saw
this
exact issue appear.  Its definitely something that's going to come
back
and
bite us if we don't fix it.

How can I convince you?  I've offered to produce a working example of
this
problem.  Will you review those results?  If I cannot convince you, I
feel
I'll have to pursue distro-specific changes for this work.

Ben, the main cause of this kind of issue in the current code is the
following line:

        /*
         * ctx->pos points to the dirent entry number.
* *desc->dir_cookie has the cookie for the next entry. We have
         * to either find the entry with the appropriate number or
         * revalidate the cookie.
         */
        if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) {
                res = nfs_revalidate_mapping(inode, file->f_mapping);
                if (res < 0)
                        goto out;
        }


That line protects the page cache against changes aften opendir(). It
was introduced by Red Hat in commmit 07b5ce8ef2d8 in order to fix a
claim of a severe performance problem.

These patches _remove_ that protection, because we're now able to cope
with more frequent revalidation without needing to restart directory
reads from scratch.

Yes, I know.  But the big change is that now we're heavily relying on
page validation to produce sane listing results, and proper page validation
relies on up-to-date change info.

So no. Without further proof, I don't accept your claim that this
patchset introduces a regression. I don't accept your claim that we are
required to revalidate the change attribute on every readdir call. We
can't do that for NFSv2 or NFSv3 (the latter offers a post_op
attribute, not pre-op attribute) and as I already pointed out, there is
nothing in POSIX that requires this.

You don't need a pre-op attribute. You just need to detect the case where
you're walking into pages that contain entries that don't match the ones
you're currently using, and post-op is as good as we can get it.

Ok, so I'm reading that further proof is required, and I'm happy to do the
work.  Thanks for the replies here and elsewhere.

Ben





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux