CephFS Octopus snapshots / kworker at 100% / kernel vs. fuse client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am running a Ceph Octopus (15.2.8) cluster primarily for CephFS. Metadata is stored on SSD, data is stored in three different pools on HDD. Currently, I use 22 subvolumes.

I am rotating snapshots on 16 subvolumes, all in the same pool, which is the primary data pool for CephFS. Currently I have 41 snapshots per subvolume. The goal is 50 snapshots (see bottom of mail for details). Snapshots are only placed in the root subvolume directory, i.e. /volumes/_nogroup/subvolname/hex-id/.snap

I place the snapshots on one of the nodes. Complete CephFS is mounted, mkdir and rmdir is performed for each relevant subvolume, then CephFS is unmounted again. All PGs are active+clean most of the time, only a few in snaptrim for 1-2 minutes after snapshot deletion. I therefore assume that snaptrim is not a limiting factor.

Obviously, the total number of snapshots is more than the 400 and 100 I see mentioned in some documentation. I am unsure if that is an issue here, as the snapshots are all in disjunct subvolumes.



When mounting the subvolumes with kernel client (ranging from CentOS 7 supplied 3.10 up to 5.4.93), after some time and for some subvolumes the kworker process begins to hug 100% cpu usage and stat operations become very slow (even slower than with fuse client). I can mostly replicate this by starting specific rsync operations (with many small files, e.g. CTAN, CentOS, Debian mirrors) and by running a bareos backup. The kworker process seems to be stuck even after terminating the causing operating, i.e. rsync or bareos-fd.

Interestingly, I can even trigger these issues on a host that has only a single CephFS subvolume without any snapshots mounted, as long as that subvolume is in the same pool as other subvolumes with snapshots.

I don't see any abnormal behaviour on the cluster nodes or on other clients during these kworker hanging phases.



With fuse client, in normal operation stat calls are about 10-20x slower than with the kernel client. However, I don't encounter the extreme slowdown behaviour. I am therefore currently mounting some known-problematic subvolumes with fuse and non-problematic subvolumes with the kernel client.



My questions are:
- Is this known or expected behaviour?
- I could move the subvolumes with snapshots into a subvolumegroup and snapshot the whole group instead of each subvolume. Will this be likely to solve the issues? - What is the current recommendation regarding CephFS and max number of snapshots?



Cluster setup:
5 nodes with a total of 56 OSDs
Each node has a Xeon Silver 4208 and 128 GB RAM
Each node has two 480GB Samsung PM883 SSD used for CephFS metadata pool
HDDs are ranging from 8TB to 14TB, majority is 14TB
10 GbE internal network and 10 GbE client network, no Jumbo frames

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    520 TiB  141 TiB  378 TiB   379 TiB      72.88
ssd    3.9 TiB  3.8 TiB  1.7 GiB    97 GiB       2.46
TOTAL  524 TiB  145 TiB  378 TiB   379 TiB      72.36

--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1     1   66 MiB       57  198 MiB      0     23 TiB
cephfs.cephfs.meta      2  1024   26 GiB    2.29M   77 GiB   2.06    1.2 TiB
cephfs.cephfs.data      3  1024   70 TiB   54.95M  213 TiB  75.19     23 TiB
lofar                   4   512   77 TiB   21.41M  154 TiB  68.68     35 TiB
proxmox                 6    64  526 GiB  158.60k  1.6 TiB   2.16     23 TiB
archive                 7    32  7.3 TiB    5.42M   10 TiB  12.57     56 TiB
Snapshots are only on cephfs.cephfs.data pool.


Intended snapshot rotation:
4 quarter-hourly snapshots
24 hourly snapshots
14 daily snapshots
8 weekly snapshots


Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux