Hi,
the mds log should contain information why it goes into read-only
mode. Just a few weeks ago I helped a user with a broken CephFS (MDS
went into read-only mode because of missing objects in the journal).
Can you check the journal status:
# cephfs-journal-tool --rank=cephfs:0 --journal=mdlog journal inspect
# cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal inspect
and also share the logs.
Thanks,
Eugen
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
Dear Ceph Community,
I am having an issue with my Ceph Cluster , there were several osd crashing
but now active and recovery finished and now the CephFS filesystem cannot be
access by clients in RW (K8S worklod) as the 1 MDS is in Read-Only and 2 are
being on trimming
The cephfs seems to have volume OK
The trimming process seems not going further, maybe stuck ?
We are running 3 hosts using ceph Pacific version 16.2.1
Here some logs on the situation :
ceph versions
{
"mon": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"osd": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 18
},
"mds": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"rgw": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 6
},
"overall": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 33
}
}
ceph orch ps
NAME HOST STATUS REFRESHED AGE
PORTS VERSION IMAGE ID CONTAINER ID
crash.rke-sh1-1 rke-sh1-1 running (21h) 36s ago 21h -
16.2.1 c757e4a3636b e8652edb2b49
crash.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 20M -
16.2.1 c757e4a3636b a1249a605ee0
crash.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h -
16.2.1 c757e4a3636b 026667bc1776
mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (18h) 36s ago 4M -
16.2.1 c757e4a3636b 9b4c2b08b759
mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (18h) 3m ago 23M -
16.2.1 c757e4a3636b 71681a5f34d3
mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (17h) 36s ago 3M -
16.2.1 c757e4a3636b e89946ad6b7e
mgr.rke-sh1-1.qskoyj rke-sh1-1 running (21h) 36s ago 2y
*:8082 *:9283 16.2.1 c757e4a3636b 7ce7cfbb3e55
mgr.rke-sh1-2.lxmguj rke-sh1-2 running (21h) 3m ago 22M
*:8082 *:9283 16.2.1 c757e4a3636b 5a0025adfd46
mgr.rke-sh1-3.ckunvo rke-sh1-3 running (17h) 36s ago 6M
*:8082 *:9283 16.2.1 c757e4a3636b 2fcaf18f3218
mon.rke-sh1-1 rke-sh1-1 running (20h) 36s ago 20h -
16.2.1 c757e4a3636b c0a90103cabc
mon.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 3M -
16.2.1 c757e4a3636b f4b32ba4466b
mon.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h -
16.2.1 c757e4a3636b d5e44c245998
osd.0 rke-sh1-2 running (20h) 3m ago 2y -
16.2.1 c757e4a3636b 7b0e69942c15
osd.1 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 4451654d9a2d
osd.10 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 3f9d5f95e284
osd.11 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b db1cc6d2e37f
osd.12 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b de416c1ef766
osd.13 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 25a281cc5a9b
osd.14 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 62f25ba61667
osd.15 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b d3514d823c45
osd.16 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b bba857759bfe
osd.17 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 59281d4bb3d0
osd.2 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 418041b5e60d
osd.3 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 04a0e29d5623
osd.4 rke-sh1-1 running (20h) 36s ago 2y -
16.2.1 c757e4a3636b 1cc78a5153d3
osd.5 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 39a4b11e31fb
osd.6 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 2f218ffb566e
osd.7 rke-sh1-1 running (20h) 36s ago 2y -
16.2.1 c757e4a3636b cf761fbe4d5f
osd.8 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b f9f85480e800
osd.9 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 664c54ff46d2
rgw.default.rke-sh1-1.dgucwl rke-sh1-1 running (21h) 36s ago 22M
*:8000 16.2.1 c757e4a3636b f03212b955a7
rgw.default.rke-sh1-1.vylchc rke-sh1-1 running (21h) 36s ago 22M
*:8001 16.2.1 c757e4a3636b da486ce43fe5
rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 running (21h) 3m ago 2y
*:8000 16.2.1 c757e4a3636b ef4089d0aef2
rgw.default.rke-sh1-2.efkbum rke-sh1-2 running (21h) 3m ago 2y
*:8001 16.2.1 c757e4a3636b 9e053d5a2f7b
rgw.default.rke-sh1-3.krfgey rke-sh1-3 running (17h) 36s ago 9M
*:8001 16.2.1 c757e4a3636b 45cd3d75edd3
rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 running (17h) 36s ago 9M
*:8000 16.2.1 c757e4a3636b e2710265a7f4
ceph health detail
HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode
[WRN] MDS_TRIM: 2 MDSs behind on trimming
mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149
mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149
root@rke-sh1-1:~# ceph fs status
cephfs - 27 clients
======
RANK STATE MDS ACTIVITY DNS INOS
DIRS CAPS
0 active cephfs.rke-sh1-2.isqjza Reqs: 8 /s 85.2k 53.2k
1742 101
0-s standby-replay cephfs.rke-sh1-1.ojmpnk Evts: 0 /s 52.2k 20.2k
1737 0
POOL TYPE USED AVAIL
cephfs_metadata metadata 1109G 6082G
cephfs_data data 8419G 6082G
STANDBY MDS
cephfs.rke-sh1-3.vdicdn
MDS version: ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)
ceph status
cluster:
id: fcb373ce-7aaa-11eb-984f-e7c6e0038e87
health: HEALTH_WARN
1 MDSs are read only
2 MDSs behind on trimming
services:
mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h)
mgr: rke-sh1-1.qskoyj(active, since 17h), standbys: rke-sh1-2.lxmguj,
rke-sh1-3.ckunvo
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 18 osds: 18 up (since 17h), 18 in (since 20h)
rgw: 6 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 849 pgs
objects: 10.10M objects, 5.3 TiB
usage: 11 TiB used, 15 TiB / 26 TiB avail
pgs: 849 active+clean
io:
client: 35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr
# ceph mds stat
cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1
up:standby
Have you got an idea on what could be my next steps to bring the cluster
healthy ?
Help will very be appreciated.
Thank a lot for your feedback.
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx