Dear Ceph Community, I am having an issue with my Ceph Cluster , there were several osd crashing but now active and recovery finished and now the CephFS filesystem cannot be access by clients in RW (K8S worklod) as the 1 MDS is in Read-Only and 2 are being on trimming The cephfs seems to have volume OK The trimming process seems not going further, maybe stuck ? We are running 3 hosts using ceph Pacific version 16.2.1 Here some logs on the situation : ceph versions { "mon": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3 }, "osd": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 18 }, "mds": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3 }, "rgw": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 6 }, "overall": { "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 33 } } ceph orch ps NAME HOST STATUS REFRESHED AGE PORTS VERSION IMAGE ID CONTAINER ID crash.rke-sh1-1 rke-sh1-1 running (21h) 36s ago 21h - 16.2.1 c757e4a3636b e8652edb2b49 crash.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 20M - 16.2.1 c757e4a3636b a1249a605ee0 crash.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h - 16.2.1 c757e4a3636b 026667bc1776 mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (18h) 36s ago 4M - 16.2.1 c757e4a3636b 9b4c2b08b759 mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (18h) 3m ago 23M - 16.2.1 c757e4a3636b 71681a5f34d3 mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (17h) 36s ago 3M - 16.2.1 c757e4a3636b e89946ad6b7e mgr.rke-sh1-1.qskoyj rke-sh1-1 running (21h) 36s ago 2y *:8082 *:9283 16.2.1 c757e4a3636b 7ce7cfbb3e55 mgr.rke-sh1-2.lxmguj rke-sh1-2 running (21h) 3m ago 22M *:8082 *:9283 16.2.1 c757e4a3636b 5a0025adfd46 mgr.rke-sh1-3.ckunvo rke-sh1-3 running (17h) 36s ago 6M *:8082 *:9283 16.2.1 c757e4a3636b 2fcaf18f3218 mon.rke-sh1-1 rke-sh1-1 running (20h) 36s ago 20h - 16.2.1 c757e4a3636b c0a90103cabc mon.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 3M - 16.2.1 c757e4a3636b f4b32ba4466b mon.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h - 16.2.1 c757e4a3636b d5e44c245998 osd.0 rke-sh1-2 running (20h) 3m ago 2y - 16.2.1 c757e4a3636b 7b0e69942c15 osd.1 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b 4451654d9a2d osd.10 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b 3f9d5f95e284 osd.11 rke-sh1-1 running (21h) 36s ago 2y - 16.2.1 c757e4a3636b db1cc6d2e37f osd.12 rke-sh1-2 running (21h) 3m ago 2y - 16.2.1 c757e4a3636b de416c1ef766 osd.13 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b 25a281cc5a9b osd.14 rke-sh1-1 running (21h) 36s ago 2y - 16.2.1 c757e4a3636b 62f25ba61667 osd.15 rke-sh1-2 running (21h) 3m ago 2y - 16.2.1 c757e4a3636b d3514d823c45 osd.16 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b bba857759bfe osd.17 rke-sh1-1 running (21h) 36s ago 2y - 16.2.1 c757e4a3636b 59281d4bb3d0 osd.2 rke-sh1-1 running (21h) 36s ago 2y - 16.2.1 c757e4a3636b 418041b5e60d osd.3 rke-sh1-2 running (21h) 3m ago 2y - 16.2.1 c757e4a3636b 04a0e29d5623 osd.4 rke-sh1-1 running (20h) 36s ago 2y - 16.2.1 c757e4a3636b 1cc78a5153d3 osd.5 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b 39a4b11e31fb osd.6 rke-sh1-2 running (21h) 3m ago 2y - 16.2.1 c757e4a3636b 2f218ffb566e osd.7 rke-sh1-1 running (20h) 36s ago 2y - 16.2.1 c757e4a3636b cf761fbe4d5f osd.8 rke-sh1-3 running (17h) 36s ago 2y - 16.2.1 c757e4a3636b f9f85480e800 osd.9 rke-sh1-2 running (21h) 3m ago 2y - 16.2.1 c757e4a3636b 664c54ff46d2 rgw.default.rke-sh1-1.dgucwl rke-sh1-1 running (21h) 36s ago 22M *:8000 16.2.1 c757e4a3636b f03212b955a7 rgw.default.rke-sh1-1.vylchc rke-sh1-1 running (21h) 36s ago 22M *:8001 16.2.1 c757e4a3636b da486ce43fe5 rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 running (21h) 3m ago 2y *:8000 16.2.1 c757e4a3636b ef4089d0aef2 rgw.default.rke-sh1-2.efkbum rke-sh1-2 running (21h) 3m ago 2y *:8001 16.2.1 c757e4a3636b 9e053d5a2f7b rgw.default.rke-sh1-3.krfgey rke-sh1-3 running (17h) 36s ago 9M *:8001 16.2.1 c757e4a3636b 45cd3d75edd3 rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 running (17h) 36s ago 9M *:8000 16.2.1 c757e4a3636b e2710265a7f4 ceph health detail HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming [WRN] MDS_READ_ONLY: 1 MDSs are read only mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode [WRN] MDS_TRIM: 2 MDSs behind on trimming mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128) max_segments: 128, num_segments: 2149 mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128) max_segments: 128, num_segments: 2149 root@rke-sh1-1:~# ceph fs status cephfs - 27 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.rke-sh1-2.isqjza Reqs: 8 /s 85.2k 53.2k 1742 101 0-s standby-replay cephfs.rke-sh1-1.ojmpnk Evts: 0 /s 52.2k 20.2k 1737 0 POOL TYPE USED AVAIL cephfs_metadata metadata 1109G 6082G cephfs_data data 8419G 6082G STANDBY MDS cephfs.rke-sh1-3.vdicdn MDS version: ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable) ceph status cluster: id: fcb373ce-7aaa-11eb-984f-e7c6e0038e87 health: HEALTH_WARN 1 MDSs are read only 2 MDSs behind on trimming services: mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h) mgr: rke-sh1-1.qskoyj(active, since 17h), standbys: rke-sh1-2.lxmguj, rke-sh1-3.ckunvo mds: 1/1 daemons up, 1 standby, 1 hot standby osd: 18 osds: 18 up (since 17h), 18 in (since 20h) rgw: 6 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 849 pgs objects: 10.10M objects, 5.3 TiB usage: 11 TiB used, 15 TiB / 26 TiB avail pgs: 849 active+clean io: client: 35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr # ceph mds stat cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1 up:standby Have you got an idea on what could be my next steps to bring the cluster healthy ? Help will very be appreciated. Thank a lot for your feedback. Best Regards, Edouard FAZENDA Technical Support Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx