On Wed, 02 Nov 2022, Jeff Layton wrote: > On Wed, 2022-11-02 at 08:23 +1100, NeilBrown wrote: > > On Wed, 02 Nov 2022, Jeff Layton wrote: > > > The filecache refcounting is a bit non-standard for something searchable > > > by RCU, in that we maintain a sentinel reference while it's hashed. This > > > in turn requires that we have to do things differently in the "put" > > > depending on whether its hashed, which we believe to have led to races. > > > > > > There are other problems in here too. nfsd_file_close_inode_sync can end > > > up freeing an nfsd_file while there are still outstanding references to > > > it, and there are a number of subtle ToC/ToU races. > > > > > > Rework the code so that the refcount is what drives the lifecycle. When > > > the refcount goes to zero, then unhash and rcu free the object. > > > > > > With this change, the LRU carries a reference. Take special care to > > > deal with it when removing an entry from the list. > > > > The refcounting and lru management all look sane here. > > > > You need to have moved the final put (and corresponding fsync) to > > different threads. I think I see you and Chuck discussing that and I > > have no sense of what is "right". > > > > Yeah, this is a tough call. I get Chuck's reticence. > > One thing we could consider is offloading the SYNC_NONE writeback > submission to a workqueue. I'm not sure though whether that's a win -- > it might just add needless context switches. OTOH, that would make it > fairly simple to kick off writeback when the REFERENCED flag is cleared, > which would probably be the best time to do it. > > An entry that ends up being harvested by the LRU scanner is going to be > touched by it at least twice: once to clear the REFERENCED flag, and > again ~2s later to reap it. > > If we schedule writeback when we clear the flag then we have a pretty > good indication that nothing else is going to be using it (though I > think we need to clear REFERENCED even when nfsd_file_check_writeback > returns true -- I'll fix that in the coming series). > > In any case, I'd probably like to do that sort of change in a separate > series after we get the first part sorted. > > > But it would be nice to explain in > > the comment what is being moved and why, so I could then confirm that > > the code matches the intent. > > > > I'm happy to add comments, but I'm a little unclear on what you're > confused by here. It's a bit too big of a patch for me to give a full > play-by-play description. Can you elaborate on what you'd like to see? > I don't need blow-by-blow, but all the behavioural changes should at least be flagged in the intro, and possibly explained. The one I particularly noticed is in nfsd_file_close_inode() which previously used nfsd_file_dispose_list() which hands the final close off to nfsd_filecache_wq. But this patch now does the final close in-line so an fsnotify event might now do the fsync. I was assuming that was deliberate and wanted it to be explained. But maybe it wasn't deliberate? The movement of flush_delayed_fput() threw me at first, but I think I understand it now - the new code for close_inode_sync is much cleaner, not needing dispose_list_sync. Thanks, NeilBrown