Re: Kworker 100% with ceph-msgr (after upgrade to 14.2.6?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I too am still suffering the same issue (snapshots lead to 100% ceph-msgr usage on client during metadata-intensive operations like backup and rsync) and had previously reported it to this list. This issue is also tracked at https://tracker.ceph.com/issues/44100

My current observations:
- approx. 20 total snapshots in the filesystem are sufficient to reliably cause the issue - in my observation there is no linear relationship between slowdown and number of snapshots. Once you reach a critical snapshot number (which might actually be 1, I have not tested this extensively) and perform the necessary operations to induce the error (for me, Bareos backups are a reliable reproducer), metadata operations on that client grind to a near-halt - memory on the MDS is not a limiting / causing factor: I now have a dedicated MDS server with 160 GB memory and adjusted mds_cache_memory_limit accordingly and saw the issue occurring at 30GB MDS memory usage - fuse mounts don't show the issue but are much slower on metadata operations overall and therefore not a solution for daily backups, as they slow down the backup too much

I'm running Ceph Octopus 15.2.13 on CentOS8. Client is CentOS8 with elrepo kernel 5.12. My workaround is to not use cephfs snapshots at all, although I really would like to use them.

Cheers
Sebastian

On 07.09.21 14:12, Frank Schilder wrote:
Hi Marc,

did you ever get a proper solution for this problem? We are having exactly the same issue, having snapshots on a file system leads to incredible performance degradation. I'm reporting some observations here (latest reply):

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#6S5GTKGGBI2Y3QE4E5XJJY2KSSLLX64H

The problem is almost certainly that the ceph kernel client executes ceph_update_snap_trace over and over again over the exact same data. I see that the execution time of ceph fs IO increases roughly with the number of snapshots present, N snapshots means ~N times slower.

I'm testing this on kernel version 5.9.9-1.el7.elrepo.x86_64. It is even worse on older kernels.

Best regards,
=================
Frank Schilder
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux