On Wed, 2022-02-23 at 08:34 -0500, Benjamin Coddington wrote: > On 23 Feb 2022, at 7:17, Trond Myklebust wrote: > > > On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote: > > > On 22 Feb 2022, at 15:11, Trond Myklebust wrote: > > > > > > > On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote: > > > > > On 21 Feb 2022, at 18:20, Trond Myklebust wrote: > > > > > > > > > > > On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington > > > > > > wrote: > > > > > > > On 21 Feb 2022, at 15:55, Trond Myklebust wrote: > > > > > > > > > > > > > > > > We will always need the ability to cut over to uncached > > > > > > > > readdir. > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > If the cookie is no longer returned by the server > > > > > > > > because > > > > > > > > one > > > > > > > > or more > > > > > > > > files were deleted then we need to resolve the > > > > > > > > situation > > > > > > > > somehow (IOW: > > > > > > > > the 'rm *' case). The new algorithm _does_ improve > > > > > > > > performance > > > > > > > > on those > > > > > > > > situations, because it no longer requires us to read > > > > > > > > the > > > > > > > > entire > > > > > > > > directory before switching over: we try 5 times, then > > > > > > > > fail > > > > > > > > over. > > > > > > > > > > > > > > Yes, using per-page validation doesn't remove the need > > > > > > > for > > > > > > > uncached > > > > > > > readdir. It does allow a reader to simply resume filling > > > > > > > the > > > > > > > cache where > > > > > > > it left off. There's no need to try 5 times and fail > > > > > > > over. > > > > > > > And > > > > > > > there's > > > > > > > no need to make a trade-off and make the situation worse > > > > > > > in > > > > > > > certain > > > > > > > scenarios. > > > > > > > > > > > > > > I thought I'd point that out and make an offer to re- > > > > > > > submit > > > > > > > it. > > > > > > > Any > > > > > > > interest? > > > > > > > > > > > > > > > > > > > As I recall, I had concerns about that approach. Can you > > > > > > explain > > > > > > again > > > > > > how it will work? > > > > > > > > > > Every page of readdir results has the directory's change attr > > > > > stored > > > > > on the > > > > > page. That, along with the page's index and the first cookie > > > > > is > > > > > enough > > > > > information to determine if the page's data can be used by > > > > > another > > > > > process. > > > > > > > > > > Which means that when the pagecache is dropped, fillers don't > > > > > have to > > > > > restart > > > > > filling the cache at page index 0, they can continue to fill > > > > > at > > > > > whatever > > > > > index they were at previously. If another process finds a > > > > > page > > > > > that > > > > > doesn't > > > > > match its page index, cookie, and the current directory's > > > > > change > > > > > attr, the > > > > > page is dropped and refilled from that process' indexing. > > > > > > > > > > > A few of the concerns I have revolve around > > > > > > telldir()/seekdir(). If > > > > > > the > > > > > > platform is 32-bit, then we cannot use cookies as the > > > > > > telldir() > > > > > > output, > > > > > > and so our only option is to use offsets into the page > > > > > > cache > > > > > > (this > > > > > > is > > > > > > why this patch carves out an exception if desc->dir_cookie > > > > > > == > > > > > > 0). > > > > > > How > > > > > > would that work with you scheme? > > > > > > > > > > For 32-bit seekdir, pages are filled starting at index 0. > > > > > This > > > > > is > > > > > very > > > > > unlikely to match other readers (unless they also do the > > > > > _same_ > > > > > seekdir). > > > > > > > > > > > Even in the 64-bit case where are able to use cookies for > > > > > > telldir()/seekdir(), how do we determine an appropriate > > > > > > page > > > > > > index > > > > > > after a seekdir()? > > > > > > > > > > We don't. Instead we start filling at index 0. Again, the > > > > > pagecache > > > > > will > > > > > only be useful to other processes that have done the same > > > > > llseek. > > > > > > > > > > This approach optimizes the pagecache for processes that are > > > > > doing > > > > > similar > > > > > work, and has the added benefit of scaling well for large > > > > > directory > > > > > listings > > > > > under memory pressure. Also a number of classes of directory > > > > > modifications > > > > > (such as renames, or insertions/deletions at locations a > > > > > reader > > > > > has > > > > > moved > > > > > beyond) are no longer a reason to re-fill the pagecache from > > > > > scratch. > > > > > > > > > > > > > OK, you've got me more or less sold on it. > > > > > > > > I'd still like to figure out how to improve the performance for > > > > seekdir > > > > (since I do have an interest in re-exporting NFS) but I've been > > > > playing > > > > around with a couple of patches that implement your concept and > > > > they do > > > > seem to work well for the common case of a linear read through > > > > the > > > > directory. > > > > > > Nice. I have another version from the one I originally posted: > > > https://lore.kernel.org/linux-nfs/cover.1611160120.git.bcodding@xxxxxxxxxx/ > > > > > > .. but I don't remember exactly the changes and it needs > > > rebasing. > > > Should I > > > rebase it against your testing branch and send the result? > > > > My 2 patches did something slightly different to yours, storing the > > change attribute in the array header instead of in page_private. > > That > > makes for a slightly smaller change. > > I worried that increasing the size of the array header wouldn't allow > us > to > store as many entries per page. The struct nfs_cache_array header is 24 bytes long with the change attribute (as opposed to 16 bytes without it). This size is independent of the architecture, assuming that 'unsigned int' is 32-bits and unsigned char is 8-bits (as is always the case on Linux). On a 64-bit system, A single struct nfs_cache_array_entry ends up being 32 bytes long. So in a standard 4k page with a 24-byte or 16 byte header you will be able to cache exactly 127 cache array entries. On a 32-bit system, the cache entry is 28 bytes long (the difference being the pointer length), and you can pack 145 entries in the 4k page. IOW: The change in header size makes no difference to the number of entries you can cache, because in both cases, the header remains smaller than a single entry. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx