On 8/8/24 10:07, Christian Ebner wrote:
Hi,
recently some of our (Proxmox VE) users report issues with file
corruptions when accessing contents located on CephFS via the in-kernel
ceph client [0,1]. According to these reports, our kernels based on
(Ubuntu) v6.8 do show these corruptions, using the FUSE client or the
in-kernel ceph client with kernels based on v6.5 does not show these.
Unfortunately the corruption is hard to reproduce.
After a further report of file corruption [2] with a completely
unrelated code path, we managed to reproduce the corruption for one file
by sheer chance on one of our ceph test clusters. We were able to narrow
it down to be possibly an issue with reading of the contents via the
in-kernel ceph client. Note that we can exclude the file contents itself
being corrupt, as any not affected kernel version or the FUSE client
gives the correct contents.
The issue is present also in the current mainline kernel 6.11-rc2.
Bisection with the reproducer points to this commit:
"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various
buffers"
Description of the issue:
* file copied from local filesystem to cephfs via:
`cp /tmp/proxmox-backup-server_3.2-1.iso
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on local filesystem:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit:
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the
corrupt checksum remains the same.
* Only this one particular file has been observed to show the issue, for
others the checksums match.
* Accessing the same file from different clients results in the same
output: The one with above patch applied do show the incorrect checksum,
ones without the patch show the correct checksum.
* The issue persists even across reboot of the ceph cluster and/or clients.
* The file is indeed corrupt after reading, as verified by a `cmp -b`.
Does anyone have an idea what could be the cause of this issue, or how
to further debug this? Happy to provide more information or a dynamic
debug output if needed.
Best regards,
Chris
[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/
Hi,
please allow me to send a followup regarding this:
Thanks to a suggestion by my colleague Friedrich Weber we have some
further interesting findings.
The issue is related to the readahead size passed to `mount.ceph`, when
mounting the filesystem [0].
Passing an `rasize` in the range of [0..1k] leads to the correct
checksum, independent of the bisected patch being applied or not.
Ranges from (1k..1M] lead to corrupt, but different checksums for
different `rasize` values, and finally `rasize` values above 1M lead to
a corrupt, but constant checksum value. Again, without the bisected
patch, the issue is not present.
Please let me know if I can provide further information or debug outputs
in order to narrow down the issue.
Best regards,
Chris
[0] https://docs.ceph.com/en/reef/man/8/mount.ceph/#advanced