Re: git regression failures with v6.2-rc NFS client

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Sat, 4 Feb 2023 00:07:34 +0000

> On Feb 3, 2023, at 18:53, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> 
> On Fri, 3 Feb 2023, Chuck Lever III wrote:
>>> On Feb 3, 2023, at 5:26 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>> On Feb 3, 2023, at 15:25, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>>>> On Feb 3, 2023, at 3:01 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>>>>> On 3 Feb 2023, at 13:03, Chuck Lever III wrote:
>>>>>> Naive suggestion: Would another option be to add server
>>>>>> support for the directory verifier?
>>>>> 
>>>>> I don't think it would resolve this bug, because what can the client do to
>>>>> find its place in the directory stream after getting NFS4ERR_NOT_SAME?
>>>>> 
>>>>> Directory verifiers might help another class of bugs, where a linear walk
>>>>> through the directory produces duplicate entries.. but I think it only helps
>>>>> some of those cases.
>>>>> 
>>>>> Knfsd /could/ use the directory verifier to keep a reference to an opened
>>>>> directory.  Perhaps knfsd's open file cache can be used.  Then, as long as
>>>>> we're doing a linear walk through the directory, the local directory's
>>>>> file->private cursor would continue to be valid to produce consistent linear
>>>>> results even if the dentries are removed in between calls to READDIR.
>>>>> 
>>>>> .. but that's not how the verifier is supposed to be used - rather it's
>>>>> supposed to signal when the client's cookies are invalid, which, for tmpfs,
>>>>> would be anytime any changes are made to the directory.
>>>> 
>>>> Right. Change the verifier whenever a directory is modified.
>>>> 
>>>> (Make it an export -> callback per filesystem type).
>>>> 
>>>>>> Well, let's not argue semantics. The optimization exposes
>>>>>> this (previously known) bug to a wider set of workloads.
>>>>> 
>>>>> Ok, yes.
>>>>> 
>>>>>> Please, open some bugs that document it. Otherwise this stuff
>>>>>> will never get addressed. Can't fix what we don't know about.
>>>>>> 
>>>>>> I'm not claiming tmpfs is perfect. But the optimization seems
>>>>>> to make things worse for more workloads. That's always been a
>>>>>> textbook definition of regression, and a non-starter for
>>>>>> upstream.
>>>>> 
>>>>> I guess I can - my impression has been there's no interest in fixing tmpfs
>>>>> for use over NFS..  after my last round of work on it I resolved to tell any
>>>>> customers that asked for it to do:
>>>>> 
>>>>> [root@fedora ~]# modprobe brd rd_size=1048576 rd_nr=1
>>>>> [root@fedora ~]# mkfs.xfs /dev/ram0
>>>>> 
>>>>> .. instead.  Though, I think that tmpfs is going to have better performance.
>>>>> 
>>>>>>> I spent a great deal of time making points about it and arguing that the
>>>>>>> client should try to work around them, and ultimately was told exactly this:
>>>>>>> https://lore.kernel.org/linux-nfs/a9411640329ed06dd7306bbdbdf251097c5e3411.camel@xxxxxxxxxxxxxxx/
>>>>>> 
>>>>>> Ah, well "client should work around them" is going to be a
>>>>>> losing argument every time.
>>>>> 
>>>>> Yeah, I agree with that, though at the time I naively thought the issues
>>>>> could be solved by better validation of changing directories.
>>>>> 
>>>>> So.. I guess I lost?  I wasn't aware of the stable cookie issues fully
>>>>> until Trond pointed them out and I compared tmpfs and xfs.  At that point, I
>>>>> saw there's not really much of a path forward for tmpfs, unless we want to
>>>>> work pretty hard at it.  But why?  Any production server wanting performance
>>>>> and stability on an NFS export isn't going to use tmpfs on knfsd. There are
>>>>> good memory-speed alternatives.
>>>> 
>>>> Just curious, what are they? I have xfs on a pair of NVMe
>>>> add-in cards, and it's quite fast. But that's an expensive
>>>> replacement for tmpfs.
>>>> 
>>>> My point is, until now, I had thought that tmpfs was a fully
>>>> supported filesystem type for NFS export. What I'm hearing
>>>> is that some people don't agree, and worse, it's not been
>>>> documented anywhere.
>>>> 
>>>> If we're not going to support tmpfs enough to fix these
>>>> significant problems, then it should be made unexportable,
>>>> and that deprecation needs to be properly documented.
>>>> 
>>>> 
>>>>>>> The optimization you'd like to revert fixes a performance regression on
>>>>>>> workloads across exports of all filesystems.  This is a fix we've had many
>>>>>>> folks asking for repeatedly.
>>>>>> 
>>>>>> Does the community agree that tmpfs has never been a first-tier
>>>>>> filesystem for exporting? That's news to me. I don't recall us
>>>>>> deprecating support for tmpfs.
>>>>> 
>>>>> I'm definitely not telling folks to use it as exported from knfsd.  I'm
>>>>> instead telling them, "Directory listings are going to be unstable, you'll
>>>>> see problems." That includes any filesystems with file_operations of
>>>>> simple_dir_operations.
>>>> 
>>>>> That said, the whole opendir, getdents, unlink, getdents pattern is maybe
>>>>> fine for a test that can assume it has exclusive rights (writes?) to a
>>>>> directory, but also probably insane for anything else that wants to reliably
>>>>> remove the thing, and we'll find that's why `rm -rf` still works. It does
>>>>> opendir, getdents, getdents, getdents, unlink, unlink, unlink, .. rmdir.
>>>>> This more closely corresponds to POSIX stating:
>>>>> 
>>>>> If a file is removed from or added to the directory after the most recent
>>>>> call to opendir() or rewinddir(), whether a subsequent call to readdir()
>>>>> returns an entry for that file is unspecified.
>>>>> 
>>>>> 
>>>>>> If an optimization broke ext4, xfs, or btrfs, it would be
>>>>>> reverted until the situation was properly addressed. I don't
>>>>>> see why the situation is different for tmpfs just because it
>>>>>> has more bugs.
>>>>> 
>>>>> Maybe it isn't?  We've yet to hear from any authoritative sources on this.
>>>> 
>>>>>>> I hope the maintainers decide not to revert
>>>>>>> it, and that we can also find a way to make readdir work reliably on tmpfs.
>>>>>> 
>>>>>> IMO the guidelines go the other way: readdir on tmpfs needs to
>>>>>> be addressed first. Reverting is a last resort, but I don't see
>>>>>> a fix coming before v6.2. Am I wrong?
>>>>> 
>>>>> Is your opinion wrong?  :)  IMO, yes, because this is just another round of
>>>>> "we don't fix the client for broken server behaviors".
>>>> 
>>>> In general, we don't fix broken servers on the client, true.
>>>> 
>>>> In this case, though, this is a regression. A client change
>>>> broke workloads that were working in v6.1.
>>> 
>>> No. We’ve had this discussion on the NFS mailing list every time someone invents a new filesystem that they want to export and then claims that they don’t need stable cookies because the NFS protocol doesn’t bother to spell that part out. The NFS client cannot and will not claim bug-free support for a filesystem that assigns cookie values in any way that is not 100% repeatable.
>> 
>> Nor should it. However:
>> 
>> A. tmpfs is not a new filesystem.
>> 
>> B. Ben says this is more or less an issue on _all_ filesystems,
>> but others manage to hide it effectively, likely as a side-effect
>> of having to deal with slow durable storage. Before the optimization,
>> tmpfs worked "well enough" as well.
>> 
>> C. If we don't want to fully support tmpfs, that's fine. But let's
>> document that properly, please.
>> 
>> D. If you guys knew beforehand that this change would break tmpfs
>> exports, there should have been a warning in the patch description.
>> 
>> 
>> The guidelines about regressions are clear. I don't agree with
>> leaving the optimization in place while tmpfs is not working well
>> enough. And actually, these issues in tmpfs should have been
>> addressed first. There's loads of precedent for that requirement.
>> 
>> But it appears that as usual I don't have much choice. What I can
>> do is file some bugs and see if we can address the server side
>> pieces.
>> 
>> So far no one has said that the tmpfs cookie problem is irreparable.
>> I'd much prefer to keep it in NFSD's stable of support.
>> 
>> https://bugzilla.linux-nfs.org/show_bug.cgi?id=405
>> 
>> And, if it helps, our server should support directory verifiers.
>> 
>> https://bugzilla.linux-nfs.org/show_bug.cgi?id=404
>> 
>> 
>>> Concerning the claim that it was unknown this is a problem with tmpfs, then see https://marc.info/?l=linux-kernel&m=100369543808129&w=2
>>> In the years since 2001, we’ve “fixed” the behaviour of tmpfs by circumventing the reliance on cookies for the case of a linear read of a tmpfs directory, but the underlying cookie behaviour is still the same, and so the NFS behaviour ends up being the same.
>>> 
>>> The bottom line is that you’ve always been playing the lottery when mounting tmpfs over NFS.
>> 
>> I'm not debating the truth of that. I just don't think we should
>> be making that situation needlessly worse.
>> 
>> And I would be much more comfortable with this if it appeared in
>> a man page or on our wiki, or ... I'm sorry, but "some email in
>> 2001" is not documentation a user should be expected to find.
> 
> I very much agree with you, Chuck.  Making something imperfect
> significantly worse is called "a regression".
> 
> And I would expect the (laudable) optimization which introduced
> that regression to be reverted from 6.2 for now, unless (I imagine
> not, but have no clue) it can be easily conditionalized somehow on
> not-tmpfs or not-simple_dir_operations.  But that's not my call.
> 
> What is the likelihood that simple_dir_operations will be enhanced,
> or a satisfactory complicated_dir_operations added?  I can assure
> you, never by me!  If Al or Amir or some dcache-savvy FS folk have
> time on their hands and an urge to add what's wanted, great: but
> that surely will not come in 6.2, if ever.
> 
> More likely that effort would have to come from the NFS(D) end,
> who will see the benefit.  And if there's some little tweak to be
> made to simple_dir_operations, which will give you the hint you need
> to handle it better, I expect fsdevel would welcome a patch or two.
> 
> Hugh

No! If it was impossible to hit this problem before the patch, then I might agree with you. However what it does is exposes a problem that has always existed, but was a lot less likely to happen timing wise when we were allowing glibc to suck in all 50000 or so directory entries in one gulp.

IOW: this patch doesn’t cause the problem, it just makes it easier to hit when you are using a high performance setup like Chuck's. It was always easy to hit when you were using slower networking and/or smaller rsize values against a remote server with multiple clients creating + deleting files in the same NFS exported tmpfs directory.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx