> On Oct 28, 2022, at 1:43 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2022-10-28 at 17:21 +0000, Chuck Lever III wrote: >> >>> On Oct 28, 2022, at 11:51 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>> >>> On Fri, 2022-10-28 at 15:29 +0000, Chuck Lever III wrote: >>>> >>>>> On Oct 28, 2022, at 11:05 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>>> >>>>> The problem with not calling vfs_fsync is that we might miss writeback >>>>> errors. The nfsd_file could get reaped before a v3 COMMIT ever comes in. >>>>> nfsd would eventually reopen the file but it could miss seeing the error >>>>> if it got opened locally in the interim. >>>> >>>> That helps. So we're surfacing writeback errors for local writers? >>>> >>> >>> Well for non-v3 writers anyway. I suppose you could hit the same >>> scenario with a mixed v3 and v4 workload if you were unlucky enough, or >>> mixed v3 and ksmbd workload, etc... >>> >>>> I guess I would like this flushing to interfere as little as possible >>>> with the server's happy zone, since it's not something clients need to >>>> wait for, and an error is exceptionally rare. >>>> >>>> But also, we can't let writeback errors hold onto a bunch of memory >>>> indefinitely. How much nfsd_file and page cache memory might be be >>>> pinned by a writeback error, and for how long? >>>> >>> >>> You mean if we were to stop trying to fsync out when closing? We don't >>> keep files in the cache indefinitely, even if they have writeback >>> errors. >>> >>> In general, the kernel attempts to write things out, and if it fails it >>> sets a writeback error in the mapping and marks the pages clean. So if >>> we're talking about files that are no longer being used (since they're >>> being GC'ed), we only block reaping them for as long as writeback is in >>> progress. >>> >>> Once writeback ends and it's eligible for reaping, we'll call vfs_fsync >>> a final time, grab the error and reset the write verifier when it's >>> non-zero. >>> >>> If we stop doing fsyncs, then that model sort of breaks down. I'm not >>> clear on what you'd like to do instead. >> >> I'm not clear either. I think I just have some hand-wavy requirements. >> >> I think keeping the flushes in the nfsd threads and away from single- >> threaded garbage collection makes sense. Keep I/O in nfsd context, not >> in the filecache garbage collector. I'm not sure that's guaranteed >> if the garbage collection thread does an nfsd_file_put() that flushes. >> > > The garbage collector doesn't call nfsd_file_put directly, though it > will call nfsd_file_free, which now does a vfs_fsync. OK, thought I saw some email fly by that suggested using nfsd_file_put in the garbage collector. But... the vfs_fsync you mention can possibly trigger I/O and wait for it (it's not the SYNC_NONE flush) in the GC thread. Rare, but I'd rather not have even that possibility if we can avoid it. >> But, back to the topic of this patch: my own experiments with background >> syncing showed that it introduces significant overhead and it wasn't >> really worth the trouble. Basically, on intensive workloads, the garbage >> collector must not stall or live-lock because it's walking through >> millions of pages trying to figure out which ones are dirty. >> > > If this is what you want, then kicking off SYNC_NONE writeback when we > put it on the LRU is the right thing to do. > > We want to ensure that when the thing is reaped from the LRU, that the > final vfs_fsync has to write nothing back. The penultimate put that adds > it to the LRU is almost certainly going to come in the context of an > nfsd thread, so kicking off background writeback at that point seems > reasonable. IIUC the opposing idea is to do a synchronous writeback in nfsd_file_put and then nothing in nfsd_file_free. Why isn't that adequate to achieve the same result ? Thinking aloud: - Suppose a client does some UNSTABLE NFSv3 WRITEs to a file - The client then waits long enough for the nfsd_file to get aged out of the filecache - A local writer on the server triggers a writeback error on the file - The error is cleared by other activity - The client sends a COMMIT Wouldn't the server miss the chance to bump its write verifier in that case? > Only files that aren't touched again get reaped off the LRU eventually, > so there should be no danger of nfsd redirtying it again. At the risk of rat-holing... IIUC the only case we should care about is if there are outstanding NFSv3 WRITEs that haven't been COMMITed. Seems like NFSv3-specific code, and not the filecache, should deal with that case, and leave nfsd_file_put/free out of it...? Again, no clear idea how it would, but just thinking about the layering here. > By the time we > get to reaping it, everything should be written back and the inode will > be ready to close with little delay. -- Chuck Lever