Re: [PATCH v3 2/4] nfsd: rework refcounting in filecache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2022-11-01 at 13:58 +0000, Chuck Lever III wrote:
> 
> > On Oct 28, 2022, at 4:13 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > On Fri, 2022-10-28 at 19:49 +0000, Chuck Lever III wrote:
> > > 
> > > > On Oct 28, 2022, at 2:57 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > > I'm still not sold on the idea of a synchronous flush in nfsd_file_free().
> > 
> > I think that we need to call this there to ensure that writeback errors
> > are handled. I worry that if try to do this piecemeal, we could end up
> > missing errors when they fall off the LRU.
> > 
> > > That feels like a deadlock waiting to happen and quite difficult to
> > > reproduce because I/O there is rarely needed. It could help to put a
> > > might_sleep() in nfsd_file_fsync(), at least, but I would prefer not to
> > > drive I/O in that path at all.
> > 
> > I don't quite grok the potential for a deadlock here. nfsd_file_free
> > already has to deal with blocking activities due to it effective doing a
> > close(). This is just another one. That's why nfsd_file_put has a
> > might_sleep in it (to warn its callers).
> > 
> > What's the deadlock scenario you envision?
> 
> I never answered this question.
> 
> I'll say up front that I believe this problem exists in the current code
> base, so what follows is meant to document an existing issue rather than
> a problem with this patch series.
> 
> The filecache sets up a shrinker callback. This callback uses the same
> or similar code paths as the filecache garbage collector.
> 
> Dai has found that trying to fsync inside a shrinker callback will
> lead to deadlocks on some filesystems (notably I believe he was testing
> btrfs at the time).
> 
> To address this, the filecache shrinker callback could avoid evicting
> nfsd_file items that are open for write.
> 

Ok, that makes sense, particularly if btrfs needs to wait on reclaim in
order to allocate.

We already skip entries that are dirty or under writeback:

       /*                                         
         * Don't throw out files that are still undergoing I/O or
         * that have uncleared errors pending.
         */
        if (nfsd_file_check_writeback(nf)) {   
                trace_nfsd_file_gc_writeback(nf);
                return LRU_SKIP;        
        }                             

...and nfsd_file_check_writeback does:

        return mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) ||
                mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK);

If that flush is due to shrinker deadlock, then you'd have to hit a
pretty tight race window, I think.

Actually...this may have been due to a race in the existing lru
callback. It didn't properly decrement the refcount when isolating
entries from the LRU, which left a pretty big window open for new
references to be grabbed while we were trying to reap the thing. These
patches may improve that if so.

FWIW, I think we're "within our rights" to not fsync on close here. The
downside of not doing that though is that if there are writeback errors,
they may go unreported to the client (aka silent data corruption). I'd
prefer to keep that in, unless we can demonstrate that it's a problem.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux