Hi,
I too am still suffering the same issue (snapshots lead to 100%
ceph-msgr usage on client during metadata-intensive operations like
backup and rsync) and had previously reported it to this list. This
issue is also tracked at https://tracker.ceph.com/issues/44100
My current observations:
- approx. 20 total snapshots in the filesystem are sufficient to
reliably cause the issue
- in my observation there is no linear relationship between slowdown and
number of snapshots. Once you reach a critical snapshot number (which
might actually be 1, I have not tested this extensively) and perform the
necessary operations to induce the error (for me, Bareos backups are a
reliable reproducer), metadata operations on that client grind to a
near-halt
- memory on the MDS is not a limiting / causing factor: I now have a
dedicated MDS server with 160 GB memory and adjusted
mds_cache_memory_limit accordingly and saw the issue occurring at 30GB
MDS memory usage
- fuse mounts don't show the issue but are much slower on metadata
operations overall and therefore not a solution for daily backups, as
they slow down the backup too much
I'm running Ceph Octopus 15.2.13 on CentOS8. Client is CentOS8 with
elrepo kernel 5.12. My workaround is to not use cephfs snapshots at all,
although I really would like to use them.
Cheers
Sebastian
On 07.09.21 14:12, Frank Schilder wrote:
Hi Marc,
did you ever get a proper solution for this problem? We are having exactly the same issue, having snapshots on a file system leads to incredible performance degradation. I'm reporting some observations here (latest reply):
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#6S5GTKGGBI2Y3QE4E5XJJY2KSSLLX64H
The problem is almost certainly that the ceph kernel client executes ceph_update_snap_trace over and over again over the exact same data. I see that the execution time of ceph fs IO increases roughly with the number of snapshots present, N snapshots means ~N times slower.
I'm testing this on kernel version 5.9.9-1.el7.elrepo.x86_64. It is even worse on older kernels.
Best regards,
=================
Frank Schilder
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx