Hello all,
We use Ceph (v18.2.2) and Rook (1.14.3) as the CSI for a Kubernetes
environment. Last week, we had a problem with the MDS falling behind on
trimming every 4-5 days (GitHub issue link
<https://github.com/rook/rook/issues/14220>). We resolved the issue
using the steps outlined in the GitHub issue.
We have 3 hosts (I know, I need to increase this as soon as possible,
and I will!) and 6 OSDs. After running the commands:
|ceph config set mds mds_dir_max_commit_size 80|, |
|
|ceph fs fail <fs_name>|, and |
|
|ceph fs set <fs_name> joinable true|,
After that, the snaptrim queue for our PGs has stopped decreasing. All
PGs of our CephFS are in either |active+clean+snaptrim_wait| or
|active+clean+snaptrim| states. For example, the PG |3.12| is in the
|active+clean+snaptrim| state, and its |snap_trimq_len| was 4077
yesterday but has increased to 4538 today.
I increased the |osd_snap_trim_priority| to 10 (|ceph config set osd
osd_snap_trim_priority 10|), but it didn't help. Only the PGs of our
CephFS have this problem.
Do you have any ideas on how we can resolve this issue?
Thanks in advance,
Giovanna
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx