Re: One cephFS snapshot kills performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 11/5/21 01:36, Sebastian Mazza wrote:


However, If I take a single snapshots in another folder (e.g. `mkdir /mnt/shares/users/.snap/test-01`) that is not even related to the `/mnt/shares/backup-remote/` test folder, the runtime of `du` with cold client caches jumps to 19m 42s. An immediate second run of `du` take only 12s but after unmounting and mounting the cephFS it take again nearly 20 minutes. That is 10 times longer than without a single snapshot. I need to do a bit more testing but at the moment it looks like that every further snapshots add around 1 minute of additional runtime.

During such a run of `du` with a snapshot anywhere in the file system all the Ceph daemons seam to be bored, also the OSDs do hardly any IO. The only thing in the system that I can find that looks busy is a kernel worker of the client that mounts the FS and runs `du`. A process named “kworker/0:1+ceph-msgr" is constantly near 100% CPU usage. The fact that the kernel seams to spend all the time in a method called “ceph_update_snap_trace” makes me even more confident that the problem is a result of snapshots.

Your report looks very much like behavior we have on a backup system (also rsync) on a Nautilus cluster (upgraded from Luminous). Many small files in the fs. We could not reproduce the issue on a separate cluster with identical data. However, we have this behavior without any snapshots. No snapshots have ever been made on this CephFS. Even though there are no snapshots it will still spend a lot of time with "snap" tasks weird enough. Although the problem might get worse with more snapshots, having (a) snapshot(s) or not does seem to be a requirement per se. It might point in the right direction ...

<snip>


I would be very interested in an explanation for this behaviour. Of course I would be very thankful for a solution of the problem or an advice that could help.

No solution, but good to know there are more workloads out there that hit this issue. If there are any CephFS devs interested in investigating this issue we are more than happy to provide more info.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux