On 1/22/25 10:29 AM, Amir Goldstein wrote:
On Wed, Jan 22, 2025 at 4:04 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
On 1/22/25 4:05 AM, Amir Goldstein wrote:
On Tue, Jan 21, 2025 at 11:59 PM NeilBrown <neilb@xxxxxxx> wrote:
On Wed, 22 Jan 2025, Amir Goldstein wrote:
On Tue, Jan 21, 2025 at 8:45 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
Please send patches To: the NFSD reviewers listed in MAINTAINERS and
Cc: linux-nfs and others. Thanks!
On 1/21/25 5:39 AM, Amir Goldstein wrote:
Commit 466e16f0920f3 ("nfsd: check for EBUSY from vfs_rmdir/vfs_unink.")
mapped EBUSY host error from rmdir/unlink operation to avoid unknown
error server warning.
The same reason that casued the reported EBUSY on rmdir() (dir is a
local mount point in some other bind mount) could also cause EBUSY on
rename and some filesystems (e.g. FUSE) can return EBUSY on other
operations like open().
Therefore, to avoid unknown error warning in server, we need to map
EBUSY for all operations.
The original fix mapped EBUSY to NFS4ERR_FILE_OPEN in v4 server and
to NFS4ERR_ACCESS in v2/v3 server.
During the discussion on this issue, Trond claimed that the mapping
made from EBUSY to NFS4ERR_FILE_OPEN was incorrect according to the
protocol spec and specifically, NFS4ERR_FILE_OPEN is not expected
for directories.
NFS4ERR_FILE_OPEN might be incorrect when removing certain types of
file system objects. Here's what I find in RFC 8881 Section 18.25.4:
> If a file has an outstanding OPEN and this prevents the removal of the
> file's directory entry, the error NFS4ERR_FILE_OPEN is returned.
It's not normative, but it does suggest that any object that cannot be
associated with an OPEN state ID should never cause REMOVE to return
NFS4ERR_FILE_OPEN.
To keep things simple and consistent and avoid the server warning,
map EBUSY to NFS4ERR_ACCESS for all operations in all protocol versions.
Generally a "one size fits all" mapping for these status codes is
not going to cut it. That's why we have nfsd3_map_status() and
nfsd_map_status() -- the set of permitted status codes for each
operation is different for each NFS version.
NFSv3 has REMOVE and RMDIR. You can't pass a directory to NFSv3 REMOVE.
NFSv4 has only REMOVE, and it removes the directory entry for the
object no matter its type. The set of failure modes is different for
this operation compared to NFSv3 REMOVE.
Adding a specific mapping for -EBUSY in nfserrno() is going to have
unintended consequences for any VFS call NFSD might make that returns
-EBUSY.
I think I prefer that the NFSv4 cases be dealt with in nfsd4_remove(),
nfsd4_rename(), and nfsd4_link(), and that -EBUSY should continue to
trigger a warning.
Sorry, I didn't understand what you are suggesting.
I'm saying that we need to consider the errno -> NFS status code
mapping on a case-by-case basis first.
FUSE can return EBUSY for open().
What do you suggest to do when nfsd encounters EBUSY on open()?
vfs_rename() can return EBUSY.
What do you suggest to do when nfsd v3 encounters EBUSY on rename()?
I totally agree that we do not want NFSD to leak -EBUSY to NFS clients.
But we do need to examine all the ways -EBUSY can leak through to the
NFS protocol layers (nfs?proc.c). The mapping is not going to be the
same for every NFS operation in every NFS version. (or, at least we
need to examine these cases closely and decide that nfserr_access is
the closest we can get for /every/ case).
This sort of assertion:
WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno);
Is a code assertion for a situation that should not be possible in the
code and certainly not possible to trigger by userspace.
Both cases above could trigger the warning from userspace.
If you want to leave the warning it should not be a WARN_ONCE()
assertion, but I must say that I did not understand the explanation
for not mapping EBUSY by default to NFS4ERR_ACCESS in nfserrno().
My answer to this last question is that it isn't obvious that EBUSY
should map to NFS4ERR_ACCESS.
I would rather that nfsd explicitly checked the error from unlink/rmdir and
mapped EBUSY to NFS4ERR_ACCESS (if we all agree that is best) with a
comment (like we have now) explaining why it is best.
Can you please suggest the text for this comment because I did not
understand the reasoning for the error.
All I argued for is conformity to NFSv2/3, but you are the one who chose
NFS3ERR_ACCES for v2/3 mapping and I don't know what is the
reasoning for this error code. All I have is:
"For NFSv3, the best we can do is probably NFS3ERR_ACCES,
which isn't true, but is not less true than the other options."
You're proposing to change the behavior of NFSv4 to match NFSv2/3, and
that's where we might need to take a moment. The NFSv4 protocol provides
a richer set of status codes to report this situation.
To be fair, I did not propose that in patch v1:
https://lore.kernel.org/linux-nfs/20250120172016.397916-1-amir73il@xxxxxxxxx/
I proposed to keep the EBUSY -> NFS4ERR_FILE_OPEN mapping for v4
and extend the operations that it applies to.
Trond had reservations about his mapping.
Well, Trond said that FILE_OPEN is wrong to return if the object
being removed is a directory. It is the correct status code if
the target object is rather a regular file.
I have no problem with going back to v1 mapping and reducing the
mapped operations to rmdir/unlink/rename/open or any other mapping
that you prefer.
v1 doesn't fix Trond's issue, IIUC.
And nfsd should explicitly check the error from open() and map EBUSY to
whatever seems appropriate. Maybe that is also NS4ERR_ACCESS but if it
is, the reason is likely different to the reason that it is best for
rmdir.
So again, I would like a comment in the code explaining the choice with
a reference to FUSE.
My specific FUSE filesystem can return EBUSY for open(), but FUSE
filesystem in general can return EBUSY for any operation if that is what
the userspace server returns.
Fair, that suggests that eventually we might need the general nfserrno
mapping in addition to some individual checking in NFS operation- and
version-specific code. I'm not ready to leap to that conclusion yet.
I am fine with handling EBUSY in unlink/rmdir/rename/open
only for now if that is what everyone prefers.
As far as I can tell, NFSv2 and NFSv3 REMOVE/RMDIR are working
correctly. NFSv4 REMOVE needs to return a status code that depends
on whether the target object is a file or not. Probably not much more
than something like this:
status = vfs_unlink( ... );
+ /* RFC 8881 Section 18.25.4 paragraph 5 */
+ if (status == nfserr_file_open && !S_ISREG(...))
+ status = nfserr_access;
added to nfsd4_remove().
Let's visit RENAME once that is addressed.
Then handle OPEN as a third patch, because I bet we are going to meet
some complications there.
Then if some other function that we haven't thought about starts
returning EBUSY, we'll get warning and have a change to think about it.
I have no objection to that, but I think that the WARN_ONCE should be
converted to pr_warn_once() or pr_warn_ratelimited() because userspace
should not be able to trigger a WARN_ON in any case.
It isn't user space that's the concern -- it's that NFS clients can
trigger this warning. If a client accidentally or maliciously triggers
it repeatedly, it can fill the NFS server's system journal.
Our general policy is that we use the _ONCE variants to prevent a remote
attack from overflowing the server's root partition.
This is what Documentation/process/coding-style.rst has to say:
Do not WARN lightly
*******************
WARN*() is intended for unexpected, this-should-never-happen situations.
WARN*() macros are not to be used for anything that is expected to happen
during normal operation. These are not pre- or post-condition asserts, for
example. Again: WARN*() must not be used for a condition that is expected
to trigger easily, for example, by user space actions. pr_warn_once() is a
possible alternative, if you need to notify the user of a problem.
---
But it's not even that - I find that syzbot and other testers treat any WARN_ON
as a bug (as they should according to coding style).
This WARN_ON in nfsd is really easy to trigger from userspace and from
malicious nfs client.
If you do not replace this WARN_ON, I anticipate that the day will come when
protocol fuzzers will start bombing you with bug reports.
If it is "possible" to hit this assertion - it should not be an assertion.
I know some people (eg, Linus) do not approve of this code development
tactic. However, it's not a BUG() and the NFS server will continue to
operate.
We are using WARN_ONCE() here. We'll get exactly one of these warnings
for each boot epoch that encounters this issue.
I realize the great value of the stack trace that WARN_ON provides in
this scenario, but if we include in the warning the operation id and the
filesystem sid I think that would be enough information to understand
where the unmapped error is coming from.
Hm. The stack trace tells us all that without having to add the extra
(rarely used) arguments to nfserrno. I'm OK with a stack trace here
because this is information for developers, who can make sense of it.
It's not a message that a system admin or user needs to understand.
It's your call. If you are not bothered by the bug reports.
I expect to get bug reports for these issues, because these are real
issues in the NFSD implementation that need to be addressed, as you are
doing right now.
--
Chuck Lever