Re: [REGRESSION]: cephfs: file corruption when reading content via in-kernel ceph client

Xiubo Li <xiubli@xxxxxxxxxx> · Fri, 6 Sep 2024 09:01:46 +0800

Hi Christian,

Thanks for reporting this.

Let me have a look and how to fix this.

- Xiubo

On 9/4/24 23:49, Christian Ebner wrote:
Hi,

some of our customers (Proxmox VE) are seeing issues with file 
corruptions when accessing contents located on CephFS via the 
in-kernel Ceph client [0,1], we managed to reproduce this regression 
on kernels up to the latest 6.11-rc6.
Accessing the same content on the CephFS using the FUSE client or the 
in-kernel ceph client with older kernels (Ubuntu kernel on v6.5) does 
not show file corruptions.
Unfortunately the corruption is hard to reproduce, seemingly only a 
small subset of files is affected. However, once a file is affected, 
the issue is persistent and can easily be reproduced.

Bisection with the reproducer points to this commit:

"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various 
buffers"

Description of the issue:

A file was copied from local filesystem to cephfs via:
```
cp /tmp/proxmox-backup-server_3.2-1.iso 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso
```
* sha256sum on local 
filesystem:`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607 
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit: 
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit: 
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736 
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the 
corrupt checksum remains the same.
* accessing the same file from different clients results in the same 
output: the one with above patch applied do show the incorrect 
checksum, ones without the patch show the correct checksum.
* the issue persists even across reboot of the ceph cluster and/or 
clients.
* the file is indeed corrupt after reading, as verified by a `cmp -b`. 
Interestingly, the first 4M contain the correct data, the following 4M 
are read as all zeros, which differs from the original data.
* the issue is related to the readahead size: mounting the cephfs with 
a `rasize=0` makes the issue disappear, same is true for sizes up to 
128k (please note that the ranges as initially reported on the mailing 
list [3] are not correct for rasize [0..128k] the file is not corrupted).

In the bugtracker issue [4] I attached a  ftrace with "*ceph*" as 
filter while performing a read on the latest kernel 6.11-rc6 while 
performing
```
dd if=/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso of=/tmp/test.out 
bs=8M count=1
```
the relevant part shown by task `dd-26192`.

Please let me know if I can provide further information or debug 
outputs in order to narrow down the issue.

[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/
[3] 
https://lore.kernel.org/lkml/db686d0c-2f27-47c8-8c14-26969433b13b@xxxxxxxxxxx/
[4] https://bugzilla.kernel.org/show_bug.cgi?id=219237

#regzbot introduced: 92b6cc5d

Regards,
Christian Ebner