On 12/17/24 5:10 PM, Rick Macklem wrote:
Hi,
The attached pcap file shows that the knfsd server generates
bogus XDR for the reply to a GETATTR that follows a READDIR
operation.
More specifically, if you look at the pcap file in wireshark
and go to packet#22 and then click on the operations and
then "Opcode: GETATTR (9)", the start of
the XDR for the GETATTR will be highlighted in the hexadecimal
window.
Now, if you look at what follows (in the hexadeciaml window),
you'll see that the GETATTR reply looks like:
- GETATTR (9)
- NFS4_OK (0)
- Length of bitmap (0) <-- Not (2)
- 2 words of attribute bitmap
- 98 (length of attributes in hex)
- attribute values
Everything looks ok, except the number of bitmap words is
0 and not 2.
Since the knfsd does not do this normally, I'd guess it is
some sort of runaway pointer or use after free type bug that
causes this, maybe?
Sofar, it only appears to happen when the GETATTR follows a
READDIR operation.
This was reported to me for a FreeBSD client mounting the following:
Debian 12 w/kernel:
$ uname -r
6.1.0-25-amd64
- what type of file system it exports
ZFS:
$ dpkg -l | fgrep libzfs4linux
ii libzfs4linux 2.1.11-1
amd64
I suspect that ZFS exports are not common for the Linux knfsd?
Anyhow, I am not sure if you have seen such a problem before,
but I thought I would at least report it.
(I have cc'd the reporter, in case you have questions for him.)
rick
ps: If the pcap file does not make it through the mailing list,
email me and I'll send you a copy.
Hi Rick,
ZFS is an "out of tree" filesystem, so the upstream Linux community does
not support it. I'm guessing libzfs4linux has its own upstream
community. Even so, it is a not unpopular choice among Linux NAS
enthusiasts.
Also, 6.1.0-25-amd64 is a Debian-built kernel, not sure how it relates
to the upstream kernel (distros add their own special sauce, and this
code base looks a couple of years old to begin with).
It might seem unfriendly, but we usually ask, in such cases, for the
reporter to work with the Linux distributor first. If they can reproduce
this issue with an "in tree" file system contained in a recent upstream
Linux kernel, then we can take a look. (Or you and J. David can give it
a try).
If it can be reproduced, we have to fix the tip of the upstream branch
first, and then take that patch, as cleanly as possible, back to the LTS
6.1 kernel code base that 6.1.0-25-amd64 is based on. Debian would then
be responsible for taking the LTS fix into their kernel.
If it cannot be reproduced, then it's likely there is already a fix
somewhere between 6.1 and the tip of upstream. Generally a "git bisect"
can identify the commit, and it can then be backported as described
above.
HTH
--
Chuck Lever