Re: [PATCH v3 3/4] nfsd: close race between unhashing and LRU addition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Oct 30, 2022, at 5:45 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Sat, 29 Oct 2022, Chuck Lever III wrote:
>> 
>>> On Oct 28, 2022, at 2:57 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> 
>>> The list_lru_add and list_lru_del functions use list_empty checks to see
>>> whether the object is already on the LRU. That's fine in most cases, but
>>> we occasionally repurpose nf_lru after unhashing. It's possible for an
>>> LRU removal to remove it from a different list altogether if we lose a
>>> race.

Can that issue be resolved by simply adding a "struct list_head nf_dispose"
field? That might be more straightforward than adding conditional logic.


>> I've never seen that happen. lru field re-use is actually used in other
>> places in the kernel. Shouldn't we try to find and fix such races?
>> 
>> Wasn't the idea to reduce the complexity of nfsd_file_put ?
>> 
> 
> I think the nfsd filecache is different from those "other places"
> because of nfsd_file_close_inode() and related code.  If I understand
> correctly, nfsd can remove a file from the filecache while it is still
> in use.

Not sure about that; I think nfsd_file_close_inode() is invoked only
when a file is deleted. I could be remembering incorrectly, but that
seems like a really difficult race to hit.


> In this case it needs to be unhashed and removed from the lru -
> and then added to a dispose list - while it might still be active for
> some IO request.
> 
> Prior to Commit 8d0d254b15cc ("nfsd: fix nfsd_file_unhash_and_dispose")
> unhash_and_dispose() wouldn't add to the dispose list unless the
> refcount was one.  I'm not sure how racy this test was, but it would
> mean that it is unlikely for an nfsd_file to be added to the dispose list
> if it was still in use.
> 
> After that commit it seems more likely that a nfsd_file will be added to
> a dispose list while it is in use.

After it's linked to a dispose list via nf_lru, list_lru_add won't put
it on the LRU -- it becomes a no-op because nf_lru is now !empty. I
think we would have seen LRU corruption pretty quickly. Re-reading
Jeff's patch description, that might not be the problem he's trying
to address here.

But, it would be easy to do some reality testing. I think you could
add a WARN_ON or tracepoint in nfsd_file_free() or somewhere in the
dispose-list path to catch an in-use nfsd_file?


> As we add/remove things to a dispose list without a lock, we need to be
> careful that no other thread will add the nfsd_file to an lru at the
> same time.  refcounts will no longer provide that protection.  Maybe we
> should restore the refcount protection, or maybe we need a bit as Jeff
> has added here.

I'm not opposed to defensive changes, in general. This one seems to be
adding significant complexity without a clear hazard. I'd like to have
a better understanding of exactly what misbehavior is being addressed.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux