Re: [PATCH v5 14/40] netfs: Add iov_iters to (sub)requests to describe various buffers

Christian Ebner <c.ebner@xxxxxxxxxxx> · Mon, 12 Aug 2024 15:01:52 +0200

On 8/8/24 10:07, Christian Ebner wrote:
Hi,

recently some of our (Proxmox VE) users report issues with file 
corruptions when accessing contents located on CephFS via the in-kernel 
ceph client [0,1]. According to these reports, our kernels based on 
(Ubuntu) v6.8 do show these corruptions, using the FUSE client or the 
in-kernel ceph client with kernels based on v6.5 does not show these.
Unfortunately the corruption is hard to reproduce.

After a further report of file corruption [2] with a completely 
unrelated code path, we managed to reproduce the corruption for one file 
by sheer chance on one of our ceph test clusters. We were able to narrow 
it down to be possibly an issue with reading of the contents via the 
in-kernel ceph client. Note that we can exclude the file contents itself 
being corrupt, as any not affected kernel version or the FUSE client 
gives the correct contents.

The issue is present also in the current mainline kernel 6.11-rc2.

Bisection with the reproducer points to this commit:

"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various 
buffers"

Description of the issue:

* file copied from local filesystem to cephfs via:
`cp /tmp/proxmox-backup-server_3.2-1.iso 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on local filesystem:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607 
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit:
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the 
corrupt checksum remains the same.
* Only this one particular file has been observed to show the issue, for 
others the checksums match.
* Accessing the same file from different clients results in the same 
output: The one with above patch applied do show the incorrect checksum, 
ones without the patch show the correct checksum.
* The issue persists even across reboot of the ceph cluster and/or clients.
* The file is indeed corrupt after reading, as verified by a `cmp -b`.

Does anyone have an idea what could be the cause of this issue, or how 
to further debug this? Happy to provide more information or a dynamic 
debug output if needed.

Best regards,

Chris

[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/

Hi,

please allow me to send a followup regarding this:

Thanks to a suggestion by my colleague Friedrich Weber we have some 
further interesting findings.

The issue is related to the readahead size passed to `mount.ceph`, when 
mounting the filesystem [0].

Passing an `rasize` in the range of [0..1k] leads to the correct 
checksum, independent of the bisected patch being applied or not.
Ranges from (1k..1M] lead to corrupt, but different checksums for 
different `rasize` values, and finally `rasize` values above 1M lead to 
a corrupt, but constant checksum value. Again, without the bisected 
patch, the issue is not present.

Please let me know if I can provide further information or debug outputs 
in order to narrow down the issue.

Best regards,
Chris

[0] https://docs.ceph.com/en/reef/man/8/mount.ceph/#advanced