Re: The snaptrim queue of PGs has not decreased for several days.

Giovanna Ratini <giovanna.ratini@xxxxxxxxxxxxxxx> · Mon, 19 Aug 2024 09:41:44 +0200

Hello Eugen,

*root@kube-master02:~# k ceph -s*

Info: running 'ceph' command with args: [-s]
  cluster:
    id:     3a35629a-6129-4daf-9db6-36e0eda637c7
    health: HEALTH_WARN
            32 pgs not deep-scrubbed in time
            32 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum bx,bz,ca (age 13h)
    mgr: a(active, since 13h), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 5h), 6 in (since 5d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 4.20M objects, 2.5 TiB
    usage:   7.7 TiB used, 76 TiB / 84 TiB avail
    pgs:     65 active+clean
             20 active+clean+snaptrim_wait
             12 active+clean+snaptrim

  io:
    client:   3.5 MiB/s rd, 3.6 MiB/s wr, 6 op/s rd, 12 op/s wr

If I understand the documentation correctly, I will never have a scrub 
unless the PGs (Placement Groups) are active and clean.

All 32 PGs of the CephFS pool have been in this status for several days:

 * 20 active+clean+snaptrim_wait
 * 12 active+clean+snaptrim"

Today, I restarted the MON, MGR, and MDS, but no changes in the growing.

Am 18.08.2024 um 18:39 schrieb Eugen Block:
Can you share the current ceph status? Are the OSDs reporting anything 
suspicious? How is the disk utilization?

Zitat von Giovanna Ratini <giovanna.ratini@xxxxxxxxxxxxxxx>:

More information:

The snaptrim take a lot of time but the he objects_trimmed are "0"

 "objects_trimmed": 0,
"snaptrim_duration": 500.58076017500002,

It could explain, why the queue are growing up..

Am 17.08.2024 um 14:37 schrieb Giovanna Ratini:
Hello again,

I checked the pgs dump. Snapshot grow up

Query für PG: 3.12
{
    "snap_trimq": 
"[5b974~3b,5cc3a~1,5cc3c~1,5cc3e~1,5cc40~1,5cd83~1,5cd85~1,5cd87~1,5cd89~1,5cecc~1,5cece~4,5ced3~2,5cf72~1,5cf74~4,5cf79~a2,5d0b8~1,5d0bb~1,5d0bd~a5,5d1f9~2,5d204~a5,5d349~a7,5d48e~3,5d493~a4,5d5d7~a7,5d71e~a3,5d7c2~3,5d860~1,5d865~4,5d86a~a2,5d9aa~1,5d9ac~1,5d9ae~a5,5daf3~a5,5db9a~2,5dc3a~a5,5dce1~1,5dce3~1,5dd81~a7,5dec8~a7,5e00f~a7,5e156~a8,5e29d~1,5e29f~a7,5e3e6~a8,5e52e~a6,5e5d6~2,5e676~a6,5e71e~2,5e7be~a9,5e907~a5,5e9ad~3,5ea50~a7,5eaf9~1,5eafb~1,5eb99~a7,5ec42~2,5ece2~a7,5ed8a~2,5ee2b~a9,5ef74~a7,5f01c~1,5f0bd~a1,5f15f~1,5f161~1,5f163~1,5f167~1,5f206~a1,5f2a8~1,5f2aa~1,5f2ac~1,5f2ae~1,5f34f~a1,5f3f1~1,5f3f3~1,5f3f5~1,5f3f7~1,5f499~a1,5f53b~1,5f53d~1,5f53f~1,5f541~1,5f5e3~a1,5f685~1,5f687~1,5f689~1,5f68d~1,5f72d~a1,5f7cf~1,5f7d1~1,5f7d3~1]",
*    "snap_trimq_len": 5421,*
    "state": "active+clean+snaptrim",
    "epoch": 734130,

Query für PG: 3.12
{
    "snap_trimq": 
"[5b976~39,5ba53~1,5ba56~a0,5cc3a~1,5cc3c~1,5cc3e~1,5cc40~1,5cd83~1,5cd85~1,5cd87~1,5cd89~1,5cecc~1,5cece~4,5ced3~2,5cf72~1,5cf74~4,5cf79~a2,5d0b8~1,5d0bb~1,5d0bd~a5,5d1f9~2,5d204~a5,5d349~a7,5d48e~3,5d493~a4,5d5d7~a7,5d71e~a3,5d7c2~3,5d860~1,5d865~4,5d86a~a2,5d9aa~1,5d9ac~1,5d9ae~a5,5daf3~a5,5db9a~2,5dc3a~a5,5dce1~1,5dce3~1,5dd81~a7,5dec8~a7,5e00f~a7,5e156~a8,5e29d~1,5e29f~a7,5e3e6~a8,5e52e~a6,5e5d6~2,5e676~a6,5e71e~2,5e7be~a9,5e907~a5,5e9ad~3,5ea50~a7,5eaf9~1,5eafb~1,5eb99~a7,5ec42~2,5ece2~a7,5ed8a~2,5ee2b~a9,5ef74~a7,5f01c~1,5f0bd~a1,5f15f~1,5f161~1,5f163~1,5f167~1,5f206~a1,5f2a8~1,5f2aa~1,5f2ac~1,5f2ae~1,5f34f~a1,5f3f1~1,5f3f3~1,5f3f5~1,5f3f7~1,5f499~a1,5f53b~1,5f53d~1,5f53f~1,5f541~1,5f5e3~a1,5f685~1,5f687~1,5f689~1,5f68d~1,5f72d~a1,5f7cf~1,5f7d1~1,5f7d3~1,5f875~a1]",
*   "snap_trimq_len": 5741,*
    "state": "active+clean+snaptrim",
    "epoch": 734240,
    "up": [

Do you know the way to see if the snaptim "process" works?

Best Regard

Gio

Am 17.08.2024 um 12:59 schrieb Giovanna Ratini:
Hello Eugen,

thank you for your answer.

I restarted all the kube-ceph nodes one after the other. Nothing 
has changed.

ok, I deactivate the snap ... : ceph fs snap-schedule deactivate /

Is there a way to see how many snapshots will be deleted per hour?

Regards,

Gio

Am 17.08.2024 um 10:12 schrieb Eugen Block:
Hi,

have you tried to fail the mgr? Sometimes the PG stats are not 
correct. You could also temporarily disable snapshots to see if 
things settle down.

Zitat von Giovanna Ratini <giovanna.ratini@xxxxxxxxxxxxxxx>:

Hello all,

We use Ceph (v18.2.2) and Rook (1.14.3) as the CSI for a 
Kubernetes environment. Last week, we had a problem with the MDS 
falling behind on trimming every 4-5 days (GitHub issue link). We 
resolved the issue using the steps outlined in the GitHub issue.

We have 3 hosts (I know, I need to increase this as soon as 
possible, and I will!) and 6 OSDs. After running the commands:

ceph config set mds mds_dir_max_commit_size 80,

ceph fs fail <fs_name>, and

ceph fs set <fs_name> joinable true,

After that, the snaptrim queue for our PGs has stopped 
decreasing. All PGs of our CephFS are in either 
active+clean+snaptrim_wait or active+clean+snaptrim states. For 
example, the PG 3.12 is in the active+clean+snaptrim state, and 
its snap_trimq_len was 4077 yesterday but has increased to 4538 
today.

I increased the osd_snap_trim_priority to 10 (ceph config set osd 
osd_snap_trim_priority 10), but it didn't help. Only the PGs of 
our CephFS have this problem.

Do you have any ideas on how we can resolve this issue?

Thanks in advance,
Giovanna
p.s. I'm not a ceph expert :-).
Faulkener asked me for more information, so here it is:
MDS Memory: 11GB
mds_cache_memory_limit: 11,811,160,064 bytes

root@kube-master02:~# ceph fs snap-schedule status /
{
    "fs": "rook-cephfs",
    "subvol": null,
    "path": "/",
    "rel_path": "/",
    "schedule": "3h",
    "retention": {"h": 24, "w": 4},
    "start": "2024-05-05T00:00:00",
    "created": "2024-05-05T17:28:18",
    "first": "2024-05-05T18:00:00",
    "last": "2024-08-15T18:00:00",
    "last_pruned": "2024-08-15T18:00:00",
    "created_count": 817,
    "pruned_count": 817,
    "active": true
}
I do not understand if the snapshots in the PGs are correlated 
with the snapshots on CephFS. Until we encountered the issue with 
the "MDS falling behind on trimming every 4-5 days," we didn't 
have any problems with snapshots.

Could someone explain me this or send me to the documentation?
Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Giovanna Ratini
Mail:ratini@xxxxxxxxxxxxxxxxxxxxxxxxx
Phone: +49 (0) 7531 88 - 4550

Technical Support
Data Analysis and Visualization Group
Department of Computer and Information Science
University of Konstanz (Box 78)
Universitätsstr. 10
78457 Konstanz, Germany
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx