> On Oct 28, 2022, at 4:13 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2022-10-28 at 19:49 +0000, Chuck Lever III wrote: >> >>> On Oct 28, 2022, at 2:57 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > >> I'm still not sold on the idea of a synchronous flush in nfsd_file_free(). > > I think that we need to call this there to ensure that writeback errors > are handled. I worry that if try to do this piecemeal, we could end up > missing errors when they fall off the LRU. > >> That feels like a deadlock waiting to happen and quite difficult to >> reproduce because I/O there is rarely needed. It could help to put a >> might_sleep() in nfsd_file_fsync(), at least, but I would prefer not to >> drive I/O in that path at all. > > I don't quite grok the potential for a deadlock here. nfsd_file_free > already has to deal with blocking activities due to it effective doing a > close(). This is just another one. That's why nfsd_file_put has a > might_sleep in it (to warn its callers). > > What's the deadlock scenario you envision? I never answered this question. I'll say up front that I believe this problem exists in the current code base, so what follows is meant to document an existing issue rather than a problem with this patch series. The filecache sets up a shrinker callback. This callback uses the same or similar code paths as the filecache garbage collector. Dai has found that trying to fsync inside a shrinker callback will lead to deadlocks on some filesystems (notably I believe he was testing btrfs at the time). To address this, the filecache shrinker callback could avoid evicting nfsd_file items that are open for write. -- Chuck Lever