Re: git regression failures with v6.2-rc NFS client

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 3 Feb 2023 20:25:42 +0000

> On Feb 3, 2023, at 3:01 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
> 
> On 3 Feb 2023, at 13:03, Chuck Lever III wrote:
>> Naive suggestion: Would another option be to add server
>> support for the directory verifier?
> 
> I don't think it would resolve this bug, because what can the client do to
> find its place in the directory stream after getting NFS4ERR_NOT_SAME?
> 
> Directory verifiers might help another class of bugs, where a linear walk
> through the directory produces duplicate entries.. but I think it only helps
> some of those cases.
> 
> Knfsd /could/ use the directory verifier to keep a reference to an opened
> directory.  Perhaps knfsd's open file cache can be used.  Then, as long as
> we're doing a linear walk through the directory, the local directory's
> file->private cursor would continue to be valid to produce consistent linear
> results even if the dentries are removed in between calls to READDIR.
> 
> .. but that's not how the verifier is supposed to be used - rather it's
> supposed to signal when the client's cookies are invalid, which, for tmpfs,
> would be anytime any changes are made to the directory.

Right. Change the verifier whenever a directory is modified.

(Make it an export -> callback per filesystem type).

>> Well, let's not argue semantics. The optimization exposes
>> this (previously known) bug to a wider set of workloads.
> 
> Ok, yes.
> 
>> Please, open some bugs that document it. Otherwise this stuff
>> will never get addressed. Can't fix what we don't know about.
>> 
>> I'm not claiming tmpfs is perfect. But the optimization seems
>> to make things worse for more workloads. That's always been a
>> textbook definition of regression, and a non-starter for
>> upstream.
> 
> I guess I can - my impression has been there's no interest in fixing tmpfs
> for use over NFS..  after my last round of work on it I resolved to tell any
> customers that asked for it to do:
> 
> [root@fedora ~]# modprobe brd rd_size=1048576 rd_nr=1
> [root@fedora ~]# mkfs.xfs /dev/ram0
> 
> .. instead.  Though, I think that tmpfs is going to have better performance.
> 
>>> I spent a great deal of time making points about it and arguing that the
>>> client should try to work around them, and ultimately was told exactly this:
>>> https://lore.kernel.org/linux-nfs/a9411640329ed06dd7306bbdbdf251097c5e3411.camel@xxxxxxxxxxxxxxx/
>> 
>> Ah, well "client should work around them" is going to be a
>> losing argument every time.
> 
> Yeah, I agree with that, though at the time I naively thought the issues
> could be solved by better validation of changing directories.
> 
> So.. I guess I lost?  I wasn't aware of the stable cookie issues fully
> until Trond pointed them out and I compared tmpfs and xfs.  At that point, I
> saw there's not really much of a path forward for tmpfs, unless we want to
> work pretty hard at it.  But why?  Any production server wanting performance
> and stability on an NFS export isn't going to use tmpfs on knfsd.  There are
> good memory-speed alternatives.

Just curious, what are they? I have xfs on a pair of NVMe
add-in cards, and it's quite fast. But that's an expensive
replacement for tmpfs.

My point is, until now, I had thought that tmpfs was a fully
supported filesystem type for NFS export. What I'm hearing
is that some people don't agree, and worse, it's not been
documented anywhere.

If we're not going to support tmpfs enough to fix these
significant problems, then it should be made unexportable,
and that deprecation needs to be properly documented.

>>> The optimization you'd like to revert fixes a performance regression on
>>> workloads across exports of all filesystems.  This is a fix we've had many
>>> folks asking for repeatedly.
>> 
>> Does the community agree that tmpfs has never been a first-tier
>> filesystem for exporting? That's news to me. I don't recall us
>> deprecating support for tmpfs.
> 
> I'm definitely not telling folks to use it as exported from knfsd.  I'm
> instead telling them, "Directory listings are going to be unstable, you'll
> see problems." That includes any filesystems with file_operations of
> simple_dir_operations.

> That said, the whole opendir, getdents, unlink, getdents pattern is maybe
> fine for a test that can assume it has exclusive rights (writes?) to a
> directory, but also probably insane for anything else that wants to reliably
> remove the thing, and we'll find that's why `rm -rf` still works.  It does
> opendir, getdents, getdents, getdents, unlink, unlink, unlink, .. rmdir.
> This more closely corresponds to POSIX stating:
> 
>    If a file is removed from or added to the directory after the most recent
>    call to opendir() or rewinddir(), whether a subsequent call to readdir()
>    returns an entry for that file is unspecified.
> 
> 
>> If an optimization broke ext4, xfs, or btrfs, it would be
>> reverted until the situation was properly addressed. I don't
>> see why the situation is different for tmpfs just because it
>> has more bugs.
> 
> Maybe it isn't?  We've yet to hear from any authoritative sources on this.

>>> I hope the maintainers decide not to revert
>>> it, and that we can also find a way to make readdir work reliably on tmpfs.
>> 
>> IMO the guidelines go the other way: readdir on tmpfs needs to
>> be addressed first. Reverting is a last resort, but I don't see
>> a fix coming before v6.2. Am I wrong?
> 
> Is your opinion wrong?  :)  IMO, yes, because this is just another round of
> "we don't fix the client for broken server behaviors".

In general, we don't fix broken servers on the client, true.

In this case, though, this is a regression. A client change
broke workloads that were working in v6.1.

I feel that makes this different than "we're not changing
the client to work around a server bug."

> If we're going to
> disagree, know that its entirely amicable from my side.  :)
> 
>> What I fear will happen is that the optimization will go in, and
>> then nobody will address (or even document) the tmpfs problem,
>> making it unusable. That would be unfortunate and wrong.
>> 
>> Please look at tmpfs and see how difficult it will be to address
>> the cookie problem properly.
> 
> I think tmpfs doesn't have any requirement to report consistent offsets in
> between calls to close(dir) and open(dir)

It has that requirement if we want to continue to make it
exportable by NFS.

> - so if you re-open the directory
> a second time and seek to some position, you're not going to be able to
> connect some earlier part of the stream to what you see now.
> 
> So, I don't think its a tmpfs problem - its a bit of a combination of issues
> that contrive to make READDIR results inconsistent on NFS.  The best place
> to improve things might be to have knfsd try to keep the directory open in
> between calls to READDIR for filesystems of simple_dir_operations.
> 
> Really, the root of the problem is that there's no stateful way to walk a
> directory that's changing, especially if you're expecting things to be the
> same in between close() and open().  XFS and ext4 do a pretty good job of
> making that problem disappear by hashing out their dentries across a very
> large range, but without some previous state or versioning, it seems a hard
> problem to solve.

Adding the tmpfs folks for an opinion. The issue is that
removing directory entries from a tmpfs directory while
directory scanning is in process results in undefined
behavior because tmpfs permits directory offsets to
change between closedir and opendir.

NFS does not have an explicit opendir or closedir operation.

--
Chuck Lever