On Fri, 2023-01-27 at 07:36 +1100, NeilBrown wrote: > On Thu, 26 Jan 2023, Jeff Layton wrote: > > On Wed, 2023-01-25 at 13:03 +1100, NeilBrown wrote: > > > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > > > index 7931fa472561..8a83d6d204ed 100644 > > > --- a/include/linux/nfs_fs.h > > > +++ b/include/linux/nfs_fs.h > > > @@ -190,6 +190,33 @@ struct nfs_inode { > > > /* Open contexts for shared mmap writes */ > > > struct list_head open_files; > > > > > > + /* Keep track of out-of-order replies. > > > + * The ooo array contains start/end pairs of > > > + * number from the changeid sequence when > > > + * the inodes iversion has been updated. > > > + * It also contains end/start pair (i.e. reverse order) > > > + * of sections of the changeid sequence that have > > > + * been seen in replies from the server. > > > + * Normally these should match and when both > > > + * A:B and B:A are found in ooo, they are both removed. > > > + * And if a reply with A:B causes an iversion update > > > + * of A:B, then neither are added. > > > + * When a reply has pre_change that doesn't match > > > + * iversion, then the changeid pair, and any consequent > > > + * change in iversion ARE added. Later replies > > > + * might fill in the gaps, or possibly a gap is caused > > > + * by a change from another client. > > > + * When a file or directory is opened, if the ooo table > > > + * is not empty, then we assume the gaps were due to > > > + * another client and we invalidate the cached data. > > > + * > > > + * We can only track a limited number of concurrent gaps. > > > + * Currently that limit is 16. > > > + */ > > > + int ooo_cnt; > > > + int ooo_max; // TRACING > > > + unsigned long ooo[16][2]; > > > > Why unsigned longs here? Shouldn't these be u64? > > Yes, they should be u64. Thanks. > > > > > I guess you could argue that when we have 32-bit longs that the most > > significant bits don't matter, but that's also the case with 64-bit > > longs, and 32 bits would halve the space requirements. > > > > Also, this grows each inode by 2k on a 64-bit arch! Maybe we should > > dynamically allocate these things instead? If the allocation fails, then > > we could just go back to marking the cache invalid and move on. > > 2K? 8*16*2+4*2 == 264, not 2048. > You're correct of course. I'm not sure where I got 2048. > But I agree that allocating on demand would make sense. 16 is probably > more than needed. I don't have proper testing results from the customer > yet, but I wouldn't be surprised if a smaller number would suffice. But > if we are allocating only on demand, it wouldn't hurt to allocate 16. Makes sense. Is there a max number of NFS writes you can have in flight to a single server in the client? That might inform how large an array you need. > -- Jeff Layton <jlayton@xxxxxxxxxx>