Hi all! I’m new to cephFS. My test file system uses a replicated pool on NVMe SSDs for metadata and an erasure coded pool on HDDs for data. All OSDs uses bluestore. I used the ceph version 16.2.6 for all daemons - created with this version and running this version. The linux kernel that I used for Mounting CephFS is from Debian: Linux file2 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux Until I created the first snapshot (e.g.: `mkdir /mnt/shares/users/.snap/test-01`) the performance of the cephFS (mount point: `/mnt/shares`) seams fine to me. I first noticed the performance problem while I was re-syncing a directory with `rsync`, since the re-sync / update took longer than the initial `rsync` run. After multiple days of investigation, I’m rather sure that the performance problem is directly related to snapshots. I first experienced my problem with `rsync` but the problem can be observed by a simple execution of `du`. Therefore, I guess that some sort of "stat" call in combination with snapshots are responsible for the bad performance. My test folder `/mnt/shares/backup-remote/` contains lots small files and many hard links in lots of sub folder. After a restart of the whole cluster and the client and without a single snapshot in the whole file system, a run of `du` takes 4m 17s. When all the OSD, MON and client caches are warmed the same `du` takes only `12s`. After umount and mount the cephFS again, which should empty all the client caches but keep caches on the OSD and MON side warmed, the execution of `du` takes 1m 56s. This runtimes are all perfectly fine for me. However, If I take a single snapshots in another folder (e.g. `mkdir /mnt/shares/users/.snap/test-01`) that is not even related to the `/mnt/shares/backup-remote/` test folder, the runtime of `du` with cold client caches jumps to 19m 42s. An immediate second run of `du` take only 12s but after unmounting and mounting the cephFS it take again nearly 20 minutes. That is 10 times longer than without a single snapshot. I need to do a bit more testing but at the moment it looks like that every further snapshots add around 1 minute of additional runtime. During such a run of `du` with a snapshot anywhere in the file system all the Ceph daemons seam to be bored, also the OSDs do hardly any IO. The only thing in the system that I can find that looks busy is a kernel worker of the client that mounts the FS and runs `du`. A process named “kworker/0:1+ceph-msgr" is constantly near 100% CPU usage. The fact that the kernel seams to spend all the time in a method called “ceph_update_snap_trace” makes me even more confident that the problem is a result of snapshots. Kernel Stack Trace examples (`echo l > /proc/sysrq-trigger` and `dmesg`) ------------------------------------------ [11316.757494] Call Trace: [11316.757494] ceph_queue_cap_snap+0x37/0x4e0 [ceph] [11316.757496] ? ceph_put_snap_realm+0x28/0xd0 [ceph] [11316.757497] ceph_update_snap_trace+0x3f0/0x4f0 [ceph] [11316.757498] dispatch+0x79d/0x1520 [ceph] [11316.757499] ceph_con_workfn+0x1a5f/0x2850 [libceph] [11316.757500] ? finish_task_switch+0x72/0x250 [11316.757502] process_one_work+0x1b6/0x350 [11316.757503] worker_thread+0x53/0x3e0 [11316.757504] ? process_one_work+0x350/0x350 [11316.757505] kthread+0x11b/0x140 [11316.757506] ? __kthread_bind_mask+0x60/0x60 [11316.757507] ret_from_fork+0x22/0x30 ------------------------------------------ [36120.030685] Call Trace: [36120.030686] sort_r+0x173/0x210 [36120.030687] build_snap_context+0x115/0x260 [ceph] [36120.030688] rebuild_snap_realms+0x23/0x70 [ceph] [36120.030689] rebuild_snap_realms+0x3d/0x70 [ceph] [36120.030690] ceph_update_snap_trace+0x2eb/0x4f0 [ceph] [36120.030691] dispatch+0x79d/0x1520 [ceph] [36120.030692] ceph_con_workfn+0x1a5f/0x2850 [libceph] [36120.030693] ? finish_task_switch+0x72/0x250 [36120.030694] process_one_work+0x1b6/0x350 [36120.030695] worker_thread+0x53/0x3e0 [36120.030695] ? process_one_work+0x350/0x350 [36120.030696] kthread+0x11b/0x140 [36120.030697] ? __kthread_bind_mask+0x60/0x60 [36120.030698] ret_from_fork+0x22/0x30 [36120.030960] NMI backtrace for cpu 3 skipped: idling at native_safe_halt+0xe/0x10 ------------------------------------------ Deleting all snapshots does not restore the original performance. Only after a recursive copy (with rsync) of the whole `backup-remote` folder to a new location and using this new folder for `du`, the performance is as it was before taking the first snapshot. Related issue reports I have found: * https://tracker.ceph.com/issues/44100?next_issue_id=44099 * https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/IDMLNQMFGTJRR5QXFZ2YAYPN67UZH4Q4/ I would be very interested in an explanation for this behaviour. Of course I would be very thankful for a solution of the problem or an advice that could help. Thanks in advance. Best wishes, Sebastian _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx