On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote: > On 22 Feb 2022, at 15:11, Trond Myklebust wrote: > > > On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote: > > > On 21 Feb 2022, at 18:20, Trond Myklebust wrote: > > > > > > > On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington wrote: > > > > > On 21 Feb 2022, at 15:55, Trond Myklebust wrote: > > > > > > > > > > > > We will always need the ability to cut over to uncached > > > > > > readdir. > > > > > > > > > > Yes. > > > > > > > > > > > If the cookie is no longer returned by the server because > > > > > > one > > > > > > or more > > > > > > files were deleted then we need to resolve the situation > > > > > > somehow (IOW: > > > > > > the 'rm *' case). The new algorithm _does_ improve > > > > > > performance > > > > > > on those > > > > > > situations, because it no longer requires us to read the > > > > > > entire > > > > > > directory before switching over: we try 5 times, then fail > > > > > > over. > > > > > > > > > > Yes, using per-page validation doesn't remove the need for > > > > > uncached > > > > > readdir. It does allow a reader to simply resume filling the > > > > > cache where > > > > > it left off. There's no need to try 5 times and fail over. > > > > > And > > > > > there's > > > > > no need to make a trade-off and make the situation worse in > > > > > certain > > > > > scenarios. > > > > > > > > > > I thought I'd point that out and make an offer to re-submit > > > > > it. > > > > > Any > > > > > interest? > > > > > > > > > > > > > As I recall, I had concerns about that approach. Can you > > > > explain > > > > again > > > > how it will work? > > > > > > Every page of readdir results has the directory's change attr > > > stored > > > on the > > > page. That, along with the page's index and the first cookie is > > > enough > > > information to determine if the page's data can be used by > > > another > > > process. > > > > > > Which means that when the pagecache is dropped, fillers don't > > > have to > > > restart > > > filling the cache at page index 0, they can continue to fill at > > > whatever > > > index they were at previously. If another process finds a page > > > that > > > doesn't > > > match its page index, cookie, and the current directory's change > > > attr, the > > > page is dropped and refilled from that process' indexing. > > > > > > > A few of the concerns I have revolve around > > > > telldir()/seekdir(). If > > > > the > > > > platform is 32-bit, then we cannot use cookies as the telldir() > > > > output, > > > > and so our only option is to use offsets into the page cache > > > > (this > > > > is > > > > why this patch carves out an exception if desc->dir_cookie == > > > > 0). > > > > How > > > > would that work with you scheme? > > > > > > For 32-bit seekdir, pages are filled starting at index 0. This > > > is > > > very > > > unlikely to match other readers (unless they also do the _same_ > > > seekdir). > > > > > > > Even in the 64-bit case where are able to use cookies for > > > > telldir()/seekdir(), how do we determine an appropriate page > > > > index > > > > after a seekdir()? > > > > > > We don't. Instead we start filling at index 0. Again, the > > > pagecache > > > will > > > only be useful to other processes that have done the same llseek. > > > > > > This approach optimizes the pagecache for processes that are > > > doing > > > similar > > > work, and has the added benefit of scaling well for large > > > directory > > > listings > > > under memory pressure. Also a number of classes of directory > > > modifications > > > (such as renames, or insertions/deletions at locations a reader > > > has > > > moved > > > beyond) are no longer a reason to re-fill the pagecache from > > > scratch. > > > > > > > OK, you've got me more or less sold on it. > > > > I'd still like to figure out how to improve the performance for > > seekdir > > (since I do have an interest in re-exporting NFS) but I've been > > playing > > around with a couple of patches that implement your concept and > > they do > > seem to work well for the common case of a linear read through the > > directory. > > Nice. I have another version from the one I originally posted: > https://lore.kernel.org/linux-nfs/cover.1611160120.git.bcodding@xxxxxxxxxx/ > > .. but I don't remember exactly the changes and it needs rebasing. > Should I > rebase it against your testing branch and send the result? My 2 patches did something slightly different to yours, storing the change attribute in the array header instead of in page_private. That makes for a slightly smaller change. Having further slept on the idea, I think I know how to solve the seekdir() issue, at least for the cases where we can use cookies as telldir()/seekdir() offsets. If we ditch the linear index, and instead use a hash of the desc->last_cookie as the index, then we can make random access to the directories work with your idea. There are a couple of further problems with this concept, but I think those are solvable. The issues I have identified so far are: * invalidate_inode_pages2_range() no longer works to clear out the page cache in a predictable way for the readdirplus heuristic. * duplicate cookie detection needs to be changed. I think those are solvable problems. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx