On 23 Feb 2022, at 7:17, Trond Myklebust wrote:
On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote:
On 22 Feb 2022, at 15:11, Trond Myklebust wrote:
On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote:
On 21 Feb 2022, at 18:20, Trond Myklebust wrote:
On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington wrote:
On 21 Feb 2022, at 15:55, Trond Myklebust wrote:
We will always need the ability to cut over to uncached
readdir.
Yes.
If the cookie is no longer returned by the server because
one
or more
files were deleted then we need to resolve the situation
somehow (IOW:
the 'rm *' case). The new algorithm _does_ improve
performance
on those
situations, because it no longer requires us to read the
entire
directory before switching over: we try 5 times, then fail
over.
Yes, using per-page validation doesn't remove the need for
uncached
readdir. It does allow a reader to simply resume filling the
cache where
it left off. There's no need to try 5 times and fail over.
And
there's
no need to make a trade-off and make the situation worse in
certain
scenarios.
I thought I'd point that out and make an offer to re-submit
it.
Any
interest?
As I recall, I had concerns about that approach. Can you
explain
again
how it will work?
Every page of readdir results has the directory's change attr
stored
on the
page. That, along with the page's index and the first cookie is
enough
information to determine if the page's data can be used by
another
process.
Which means that when the pagecache is dropped, fillers don't
have to
restart
filling the cache at page index 0, they can continue to fill at
whatever
index they were at previously. If another process finds a page
that
doesn't
match its page index, cookie, and the current directory's change
attr, the
page is dropped and refilled from that process' indexing.
A few of the concerns I have revolve around
telldir()/seekdir(). If
the
platform is 32-bit, then we cannot use cookies as the telldir()
output,
and so our only option is to use offsets into the page cache
(this
is
why this patch carves out an exception if desc->dir_cookie ==
0).
How
would that work with you scheme?
For 32-bit seekdir, pages are filled starting at index 0. This
is
very
unlikely to match other readers (unless they also do the _same_
seekdir).
Even in the 64-bit case where are able to use cookies for
telldir()/seekdir(), how do we determine an appropriate page
index
after a seekdir()?
We don't. Instead we start filling at index 0. Again, the
pagecache
will
only be useful to other processes that have done the same llseek.
This approach optimizes the pagecache for processes that are
doing
similar
work, and has the added benefit of scaling well for large
directory
listings
under memory pressure. Also a number of classes of directory
modifications
(such as renames, or insertions/deletions at locations a reader
has
moved
beyond) are no longer a reason to re-fill the pagecache from
scratch.
OK, you've got me more or less sold on it.
I'd still like to figure out how to improve the performance for
seekdir
(since I do have an interest in re-exporting NFS) but I've been
playing
around with a couple of patches that implement your concept and
they do
seem to work well for the common case of a linear read through the
directory.
Nice. I have another version from the one I originally posted:
https://lore.kernel.org/linux-nfs/cover.1611160120.git.bcodding@xxxxxxxxxx/
.. but I don't remember exactly the changes and it needs rebasing.
Should I
rebase it against your testing branch and send the result?
My 2 patches did something slightly different to yours, storing the
change attribute in the array header instead of in page_private. That
makes for a slightly smaller change.
I worried that increasing the size of the array header wouldn't allow us
to
store as many entries per page.
Having further slept on the idea, I think I know how to solve the
seekdir() issue, at least for the cases where we can use cookies as
telldir()/seekdir() offsets. If we ditch the linear index, and instead
use a hash of the desc->last_cookie as the index, then we can make
random access to the directories work with your idea.
Yes! Nice!
There are a couple of further problems with this concept, but I think
those are solvable. The issues I have identified so far are:
* invalidate_inode_pages2_range() no longer works to clear out the
page cache in a predictable way for the readdirplus heuristic.
* duplicate cookie detection needs to be changed.
I don't understand those problems, yet. I have a few other things I
have to
do, but after those I will come back and give them some attention.
FWIW, I rebased that old series of page-validation work on your
e85f86150b32 NFS: Trace effects of the readdirplus heuristic
I'll send it as a reply to [ v6 13/13] in this posting, but I've only
compile-tested it so far.
Ben