Dear Eugen, We have followed the workaround here : https://tracker.ceph.com/issues/58082#note-11 And the cluster goes healthy, K8S workload are back. # ceph status cluster: id: fcb373ce-7aaa-11eb-984f-e7c6e0038e87 health: HEALTH_OK services: mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 87m) mgr: rke-sh1-2.lxmguj(active, since 2h), standbys: rke-sh1-3.ckunvo, rke-sh1-1.qskoyj mds: 1/1 daemons up, 1 standby, 1 hot standby osd: 18 osds: 18 up (since 88m), 18 in (since 24h) rgw: 6 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 737 pgs objects: 9.98M objects, 4.8 TiB usage: 10 TiB used, 16 TiB / 26 TiB avail pgs: 737 active+clean io: client: 226 MiB/s rd, 208 MiB/s wr, 109 op/s rd, 272 op/s wr progress: Global Recovery Event (17m) [========================....] (remaining: 2m) # ceph fs status cephfs - 33 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.rke-sh1-1.ojmpnk Reqs: 51 /s 103k 97.7k 11.2k 20.7k 0-s standby-replay cephfs.rke-sh1-2.isqjza Evts: 30 /s 0 0 0 0 POOL TYPE USED AVAIL cephfs_metadata metadata 49.9G 6685G cephfs_data data 8423G 6685G STANDBY MDS cephfs.rke-sh1-3.vdicdn MDS version: ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable) # ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 ssd 1.45549 1.00000 1.5 TiB 562 GiB 557 GiB 13 MiB 4.6 GiB 929 GiB 37.68 0.98 86 up 4 ssd 1.45549 1.00000 1.5 TiB 566 GiB 561 GiB 9.1 MiB 4.9 GiB 925 GiB 37.95 0.98 94 up 7 ssd 1.45549 1.00000 1.5 TiB 590 GiB 584 GiB 16 MiB 5.5 GiB 901 GiB 39.57 1.03 93 up 11 ssd 1.45549 1.00000 1.5 TiB 563 GiB 558 GiB 15 MiB 4.3 GiB 928 GiB 37.75 0.98 93 up 14 ssd 1.45549 1.00000 1.5 TiB 575 GiB 570 GiB 11 MiB 4.8 GiB 916 GiB 38.56 1.00 97 up 17 ssd 1.45549 1.00000 1.5 TiB 651 GiB 646 GiB 30 MiB 4.6 GiB 840 GiB 43.67 1.13 95 up 0 ssd 1.45549 1.00000 1.5 TiB 614 GiB 608 GiB 15 MiB 5.3 GiB 877 GiB 41.18 1.07 98 up 3 ssd 1.45549 1.00000 1.5 TiB 673 GiB 668 GiB 20 MiB 4.7 GiB 817 GiB 45.16 1.17 105 up 6 ssd 1.45549 1.00000 1.5 TiB 527 GiB 523 GiB 11 MiB 4.9 GiB 963 GiB 35.39 0.92 86 up 9 ssd 1.45549 1.00000 1.5 TiB 549 GiB 545 GiB 16 MiB 4.2 GiB 942 GiB 36.83 0.95 88 up 12 ssd 1.45549 1.00000 1.5 TiB 551 GiB 546 GiB 11 MiB 4.4 GiB 940 GiB 36.95 0.96 96 up 15 ssd 1.45549 1.00000 1.5 TiB 594 GiB 589 GiB 16 MiB 4.4 GiB 897 GiB 39.83 1.03 84 up 1 ssd 1.45549 1.00000 1.5 TiB 520 GiB 516 GiB 10 MiB 3.6 GiB 970 GiB 34.89 0.90 87 up 5 ssd 1.45549 1.00000 1.5 TiB 427 GiB 423 GiB 7.9 MiB 4.0 GiB 1.0 TiB 28.64 0.74 74 up 8 ssd 1.45549 1.00000 1.5 TiB 625 GiB 620 GiB 27 MiB 4.7 GiB 866 GiB 41.92 1.09 97 up 10 ssd 1.45549 1.00000 1.5 TiB 562 GiB 557 GiB 12 MiB 5.1 GiB 929 GiB 37.69 0.98 92 up 13 ssd 1.45549 1.00000 1.5 TiB 673 GiB 668 GiB 7.2 MiB 5.0 GiB 817 GiB 45.15 1.17 101 up 16 ssd 1.45549 1.00000 1.5 TiB 534 GiB 530 GiB 5.7 MiB 3.5 GiB 957 GiB 35.81 0.93 85 up TOTAL 26 TiB 10 TiB 10 TiB 254 MiB 82 GiB 16 TiB 38.59 MIN/MAX VAR: 0.74/1.17 STDDEV: 3.88 Thanks for the help ! Best Regards, Edouard FAZENDA Technical Support Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 www.csti.ch -----Original Message----- From: Eugen Block <eblock@xxxxxx> Sent: vendredi, 23 février 2024 15:05 To: Edouard FAZENDA <e.fazenda@xxxxxxx> Cc: ceph-users@xxxxxxx Subject: Re: Re: MDS in ReadOnly and 2 MDS behind on trimming > 2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 log_channel(cluster) log > [ERR] : failed to commit dir 0x1 object, errno -22 > 2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 mds.0.12487 unhandled > write error (22) Invalid argument, force readonly... Was your cephfs metadata pool full? This tracker (https://tracker.ceph.com/issues/52260) sounds very similar but I don't see a solution for it. Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>: > Hi Eugen, > > Thanks for the reply, really appreciate > > The first command , just hang with no output # cephfs-journal-tool > --rank=cephfs:0 --journal=mdlog journal inspect > > The second command > > # cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal > inspect Overall journal integrity: OK > > root@rke-sh1-2:~# cephadm logs --fsid > fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name > mds.cephfs.rke-sh1-2.isqjza > -- Logs begin at Fri 2024-02-23 04:49:32 UTC, end at Fri 2024-02-23 > 13:08:22 UTC. -- > Feb 23 07:46:46 rke-sh1-2 bash[1058012]: ignoring --setuser ceph since > I am not root Feb 23 07:46:46 rke-sh1-2 bash[1058012]: ignoring > --setgroup ceph since I am not root Feb 23 07:46:46 rke-sh1-2 > bash[1058012]: starting mds.cephfs.rke-sh1-2.isqjza at Feb 23 08:15:06 > rke-sh1-2 bash[1058012]: debug > 2024-02-23T08:15:06.371+0000 7fbc17dd9700 -1 mds.pinger > is_rank_lagging: rank=0 was never sent ping request. > Feb 23 08:15:13 rke-sh1-2 bash[1058012]: debug > 2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 log_channel(cluster) log > [ERR] : failed to commit dir 0x1 object, errno -22 Feb 23 08:15:13 > rke-sh1-2 bash[1058012]: debug > 2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 mds.0.12487 unhandled > write error (22) Invalid argument, force readonly... > Feb 23 10:20:36 rke-sh1-2 bash[1058012]: debug > 2024-02-23T10:20:36.309+0000 7fbc17dd9700 -1 mds.pinger > is_rank_lagging: rank=1 was never sent ping request. > > root@rke-sh1-3:~# cephadm logs --fsid > fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name > mds.cephfs.rke-sh1-3.vdicdn > -- Logs begin at Fri 2024-02-23 06:59:48 UTC, end at Fri 2024-02-23 > 13:09:18 UTC. -- > Feb 23 07:46:46 rke-sh1-3 bash[2901]: ignoring --setuser ceph since I > am not root Feb 23 07:46:46 rke-sh1-3 bash[2901]: ignoring --setgroup > ceph since I am not root Feb 23 07:46:46 rke-sh1-3 bash[2901]: > starting mds.cephfs.rke-sh1-3.vdicdn at Feb 23 10:25:51 rke-sh1-3 > bash[2901]: ignoring --setuser ceph since I am not root Feb 23 > 10:25:51 rke-sh1-3 bash[2901]: ignoring --setgroup ceph since I am not > root Feb 23 10:25:51 rke-sh1-3 bash[2901]: starting > mds.cephfs.rke-sh1-3.vdicdn at > > debug2: channel 0: request window-change confirm 0 > debug3: send packet: type 98 > -- Logs begin at Fri 2024-02-23 00:24:42 UTC, end at Fri 2024-02-23 > 13:09:55 UTC. -- > Feb 23 09:29:10 rke-sh1-1 bash[786820]: tcmalloc: large alloc > 1073750016 bytes == 0x5598512de000 @ 0x7fb426636760 0x7fb426657c64 > 0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4 > 0x7fb41da6> > Feb 23 09:29:19 rke-sh1-1 bash[786820]: tcmalloc: large alloc > 2147491840 bytes == 0x559891ae0000 @ 0x7fb426636760 0x7fb426657c64 > 0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4 > 0x7fb41db3> > Feb 23 09:29:26 rke-sh1-1 bash[786820]: tcmalloc: large alloc > 2147491840 bytes == 0x559951ae4000 @ 0x7fb426636760 0x7fb426657c64 > 0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4 > 0x7fb41da6> > Feb 23 09:29:27 rke-sh1-1 bash[786820]: debug > 2024-02-23T09:29:27.928+0000 7fb416d63700 -1 asok(0x5597c3904000) > AdminSocket: error writing response length (32) Broken pipe Feb 23 > 12:35:53 rke-sh1-1 bash[786820]: ignoring --setuser ceph since I am > not root Feb 23 12:35:53 rke-sh1-1 bash[786820]: ignoring --setgroup > ceph since I am not root Feb 23 12:35:53 rke-sh1-1 bash[786820]: > starting mds.cephfs.rke-sh1-1.ojmpnk at > > > The logs of the MDS are in verbose 20 , do you want me to provide on a > archive ? > > Is there a way to compact all the logs ? > > Best Regards, > > Edouard FAZENDA > Technical Support > > > > Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 > > www.csti.ch > > -----Original Message----- > From: Eugen Block <eblock@xxxxxx> > Sent: vendredi, 23 février 2024 12:50 > To: ceph-users@xxxxxxx > Subject: Re: MDS in ReadOnly and 2 MDS behind on trimming > > Hi, > > the mds log should contain information why it goes into read-only > mode. Just a few weeks ago I helped a user with a broken CephFS (MDS > went into read-only mode because of missing objects in the journal). > Can you check the journal status: > > # cephfs-journal-tool --rank=cephfs:0 --journal=mdlog journal inspect > > # cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal > inspect > > and also share the logs. > > Thanks, > Eugen > > Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>: > >> Dear Ceph Community, >> >> >> >> I am having an issue with my Ceph Cluster , there were several osd >> crashing but now active and recovery finished and now the CephFS >> filesystem cannot be access by clients in RW (K8S worklod) as the 1 >> MDS is in Read-Only and 2 are being on trimming >> >> >> >> The cephfs seems to have volume OK >> >> >> >> The trimming process seems not going further, maybe stuck ? >> >> >> >> We are running 3 hosts using ceph Pacific version 16.2.1 >> >> >> >> Here some logs on the situation : >> >> >> >> ceph versions >> >> { >> >> "mon": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 3 >> >> }, >> >> "mgr": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 3 >> >> }, >> >> "osd": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 18 >> >> }, >> >> "mds": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 3 >> >> }, >> >> "rgw": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 6 >> >> }, >> >> "overall": { >> >> "ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable)": 33 >> >> } >> >> } >> >> >> >> ceph orch ps >> >> NAME HOST STATUS REFRESHED AGE >> PORTS VERSION IMAGE ID CONTAINER ID >> >> crash.rke-sh1-1 rke-sh1-1 running (21h) 36s ago 21h - >> 16.2.1 c757e4a3636b e8652edb2b49 >> >> crash.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 20M - >> 16.2.1 c757e4a3636b a1249a605ee0 >> >> crash.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h - >> 16.2.1 c757e4a3636b 026667bc1776 >> >> mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (18h) 36s ago 4M - >> 16.2.1 c757e4a3636b 9b4c2b08b759 >> >> mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (18h) 3m ago 23M - >> 16.2.1 c757e4a3636b 71681a5f34d3 >> >> mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (17h) 36s ago 3M - >> 16.2.1 c757e4a3636b e89946ad6b7e >> >> mgr.rke-sh1-1.qskoyj rke-sh1-1 running (21h) 36s ago 2y >> *:8082 *:9283 16.2.1 c757e4a3636b 7ce7cfbb3e55 >> >> mgr.rke-sh1-2.lxmguj rke-sh1-2 running (21h) 3m ago 22M >> *:8082 *:9283 16.2.1 c757e4a3636b 5a0025adfd46 >> >> mgr.rke-sh1-3.ckunvo rke-sh1-3 running (17h) 36s ago 6M >> *:8082 *:9283 16.2.1 c757e4a3636b 2fcaf18f3218 >> >> mon.rke-sh1-1 rke-sh1-1 running (20h) 36s ago 20h - >> 16.2.1 c757e4a3636b c0a90103cabc >> >> mon.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 3M - >> 16.2.1 c757e4a3636b f4b32ba4466b >> >> mon.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h - >> 16.2.1 c757e4a3636b d5e44c245998 >> >> osd.0 rke-sh1-2 running (20h) 3m ago 2y - >> 16.2.1 c757e4a3636b 7b0e69942c15 >> >> osd.1 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b 4451654d9a2d >> >> osd.10 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b 3f9d5f95e284 >> >> osd.11 rke-sh1-1 running (21h) 36s ago 2y - >> 16.2.1 c757e4a3636b db1cc6d2e37f >> >> osd.12 rke-sh1-2 running (21h) 3m ago 2y - >> 16.2.1 c757e4a3636b de416c1ef766 >> >> osd.13 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b 25a281cc5a9b >> >> osd.14 rke-sh1-1 running (21h) 36s ago 2y - >> 16.2.1 c757e4a3636b 62f25ba61667 >> >> osd.15 rke-sh1-2 running (21h) 3m ago 2y - >> 16.2.1 c757e4a3636b d3514d823c45 >> >> osd.16 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b bba857759bfe >> >> osd.17 rke-sh1-1 running (21h) 36s ago 2y - >> 16.2.1 c757e4a3636b 59281d4bb3d0 >> >> osd.2 rke-sh1-1 running (21h) 36s ago 2y - >> 16.2.1 c757e4a3636b 418041b5e60d >> >> osd.3 rke-sh1-2 running (21h) 3m ago 2y - >> 16.2.1 c757e4a3636b 04a0e29d5623 >> >> osd.4 rke-sh1-1 running (20h) 36s ago 2y - >> 16.2.1 c757e4a3636b 1cc78a5153d3 >> >> osd.5 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b 39a4b11e31fb >> >> osd.6 rke-sh1-2 running (21h) 3m ago 2y - >> 16.2.1 c757e4a3636b 2f218ffb566e >> >> osd.7 rke-sh1-1 running (20h) 36s ago 2y - >> 16.2.1 c757e4a3636b cf761fbe4d5f >> >> osd.8 rke-sh1-3 running (17h) 36s ago 2y - >> 16.2.1 c757e4a3636b f9f85480e800 >> >> osd.9 rke-sh1-2 running (21h) 3m ago 2y - >> 16.2.1 c757e4a3636b 664c54ff46d2 >> >> rgw.default.rke-sh1-1.dgucwl rke-sh1-1 running (21h) 36s ago 22M >> *:8000 16.2.1 c757e4a3636b f03212b955a7 >> >> rgw.default.rke-sh1-1.vylchc rke-sh1-1 running (21h) 36s ago 22M >> *:8001 16.2.1 c757e4a3636b da486ce43fe5 >> >> rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 running (21h) 3m ago 2y >> *:8000 16.2.1 c757e4a3636b ef4089d0aef2 >> >> rgw.default.rke-sh1-2.efkbum rke-sh1-2 running (21h) 3m ago 2y >> *:8001 16.2.1 c757e4a3636b 9e053d5a2f7b >> >> rgw.default.rke-sh1-3.krfgey rke-sh1-3 running (17h) 36s ago 9M >> *:8001 16.2.1 c757e4a3636b 45cd3d75edd3 >> >> rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 running (17h) 36s ago 9M >> *:8000 16.2.1 c757e4a3636b e2710265a7f4 >> >> >> >> ceph health detail >> >> HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming >> >> [WRN] MDS_READ_ONLY: 1 MDSs are read only >> >> mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode >> >> [WRN] MDS_TRIM: 2 MDSs behind on trimming >> >> mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128) >> max_segments: 128, num_segments: 2149 >> >> mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128) >> max_segments: 128, num_segments: 2149 >> >> >> >> root@rke-sh1-1:~# ceph fs status >> >> cephfs - 27 clients >> >> ====== >> >> RANK STATE MDS ACTIVITY DNS INOS >> DIRS CAPS >> >> 0 active cephfs.rke-sh1-2.isqjza Reqs: 8 /s 85.2k 53.2k >> 1742 101 >> >> 0-s standby-replay cephfs.rke-sh1-1.ojmpnk Evts: 0 /s 52.2k 20.2k >> 1737 0 >> >> POOL TYPE USED AVAIL >> >> cephfs_metadata metadata 1109G 6082G >> >> cephfs_data data 8419G 6082G >> >> STANDBY MDS >> >> cephfs.rke-sh1-3.vdicdn >> >> MDS version: ceph version 16.2.1 >> (afb9061ab4117f798c858c741efa6390e48ccf10) >> pacific (stable) >> >> >> >> ceph status >> >> cluster: >> >> id: fcb373ce-7aaa-11eb-984f-e7c6e0038e87 >> >> health: HEALTH_WARN >> >> 1 MDSs are read only >> >> 2 MDSs behind on trimming >> >> >> >> services: >> >> mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h) >> >> mgr: rke-sh1-1.qskoyj(active, since 17h), standbys: >> rke-sh1-2.lxmguj, rke-sh1-3.ckunvo >> >> mds: 1/1 daemons up, 1 standby, 1 hot standby >> >> osd: 18 osds: 18 up (since 17h), 18 in (since 20h) >> >> rgw: 6 daemons active (3 hosts, 1 zones) >> >> >> >> data: >> >> volumes: 1/1 healthy >> >> pools: 11 pools, 849 pgs >> >> objects: 10.10M objects, 5.3 TiB >> >> usage: 11 TiB used, 15 TiB / 26 TiB avail >> >> pgs: 849 active+clean >> >> >> >> io: >> >> client: 35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr >> >> >> >> >> >> # ceph mds stat >> >> cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1 >> up:standby >> >> >> >> Have you got an idea on what could be my next steps to bring the >> cluster healthy ? >> >> >> >> Help will very be appreciated. >> >> >> >> Thank a lot for your feedback. >> >> >> >> Best Regards, >> >> >> >> Edouard FAZENDA >> >> Technical Support >> >> >> >> >> >> >> >> Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 >> >> >> >> <https://www.csti.ch/> www.csti.ch > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx