2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 log_channel(cluster)
log [ERR] : failed to commit dir 0x1 object, errno -22
2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 mds.0.12487 unhandled
write error (22) Invalid argument, force readonly...
Was your cephfs metadata pool full? This tracker
(https://tracker.ceph.com/issues/52260) sounds very similar but I
don't see a solution for it.
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
Hi Eugen,
Thanks for the reply, really appreciate
The first command , just hang with no output
# cephfs-journal-tool --rank=cephfs:0 --journal=mdlog journal inspect
The second command
# cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal inspect
Overall journal integrity: OK
root@rke-sh1-2:~# cephadm logs --fsid
fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name
mds.cephfs.rke-sh1-2.isqjza
-- Logs begin at Fri 2024-02-23 04:49:32 UTC, end at Fri 2024-02-23
13:08:22 UTC. --
Feb 23 07:46:46 rke-sh1-2 bash[1058012]: ignoring --setuser ceph
since I am not root
Feb 23 07:46:46 rke-sh1-2 bash[1058012]: ignoring --setgroup ceph
since I am not root
Feb 23 07:46:46 rke-sh1-2 bash[1058012]: starting
mds.cephfs.rke-sh1-2.isqjza at
Feb 23 08:15:06 rke-sh1-2 bash[1058012]: debug
2024-02-23T08:15:06.371+0000 7fbc17dd9700 -1 mds.pinger
is_rank_lagging: rank=0 was never sent ping request.
Feb 23 08:15:13 rke-sh1-2 bash[1058012]: debug
2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 log_channel(cluster)
log [ERR] : failed to commit dir 0x1 object, errno -22
Feb 23 08:15:13 rke-sh1-2 bash[1058012]: debug
2024-02-23T08:15:13.155+0000 7fbc145d2700 -1 mds.0.12487 unhandled
write error (22) Invalid argument, force readonly...
Feb 23 10:20:36 rke-sh1-2 bash[1058012]: debug
2024-02-23T10:20:36.309+0000 7fbc17dd9700 -1 mds.pinger
is_rank_lagging: rank=1 was never sent ping request.
root@rke-sh1-3:~# cephadm logs --fsid
fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name
mds.cephfs.rke-sh1-3.vdicdn
-- Logs begin at Fri 2024-02-23 06:59:48 UTC, end at Fri 2024-02-23
13:09:18 UTC. --
Feb 23 07:46:46 rke-sh1-3 bash[2901]: ignoring --setuser ceph since
I am not root
Feb 23 07:46:46 rke-sh1-3 bash[2901]: ignoring --setgroup ceph since
I am not root
Feb 23 07:46:46 rke-sh1-3 bash[2901]: starting mds.cephfs.rke-sh1-3.vdicdn at
Feb 23 10:25:51 rke-sh1-3 bash[2901]: ignoring --setuser ceph since
I am not root
Feb 23 10:25:51 rke-sh1-3 bash[2901]: ignoring --setgroup ceph since
I am not root
Feb 23 10:25:51 rke-sh1-3 bash[2901]: starting mds.cephfs.rke-sh1-3.vdicdn at
debug2: channel 0: request window-change confirm 0
debug3: send packet: type 98
-- Logs begin at Fri 2024-02-23 00:24:42 UTC, end at Fri 2024-02-23
13:09:55 UTC. --
Feb 23 09:29:10 rke-sh1-1 bash[786820]: tcmalloc: large alloc
1073750016 bytes == 0x5598512de000 @ 0x7fb426636760 0x7fb426657c64
0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4
0x7fb41da6>
Feb 23 09:29:19 rke-sh1-1 bash[786820]: tcmalloc: large alloc
2147491840 bytes == 0x559891ae0000 @ 0x7fb426636760 0x7fb426657c64
0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4
0x7fb41db3>
Feb 23 09:29:26 rke-sh1-1 bash[786820]: tcmalloc: large alloc
2147491840 bytes == 0x559951ae4000 @ 0x7fb426636760 0x7fb426657c64
0x5597c1ccaaba 0x7fb41bc04218 0x7fb41bc0ed5b 0x7fb41bbfeda4
0x7fb41da6>
Feb 23 09:29:27 rke-sh1-1 bash[786820]: debug
2024-02-23T09:29:27.928+0000 7fb416d63700 -1 asok(0x5597c3904000)
AdminSocket: error writing response length (32) Broken pipe
Feb 23 12:35:53 rke-sh1-1 bash[786820]: ignoring --setuser ceph
since I am not root
Feb 23 12:35:53 rke-sh1-1 bash[786820]: ignoring --setgroup ceph
since I am not root
Feb 23 12:35:53 rke-sh1-1 bash[786820]: starting
mds.cephfs.rke-sh1-1.ojmpnk at
The logs of the MDS are in verbose 20 , do you want me to provide on
a archive ?
Is there a way to compact all the logs ?
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
www.csti.ch
-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: vendredi, 23 février 2024 12:50
To: ceph-users@xxxxxxx
Subject: Re: MDS in ReadOnly and 2 MDS behind on trimming
Hi,
the mds log should contain information why it goes into read-only
mode. Just a few weeks ago I helped a user with a broken CephFS (MDS
went into read-only mode because of missing objects in the journal).
Can you check the journal status:
# cephfs-journal-tool --rank=cephfs:0 --journal=mdlog journal inspect
# cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal inspect
and also share the logs.
Thanks,
Eugen
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
Dear Ceph Community,
I am having an issue with my Ceph Cluster , there were several osd
crashing but now active and recovery finished and now the CephFS
filesystem cannot be access by clients in RW (K8S worklod) as the 1
MDS is in Read-Only and 2 are being on trimming
The cephfs seems to have volume OK
The trimming process seems not going further, maybe stuck ?
We are running 3 hosts using ceph Pacific version 16.2.1
Here some logs on the situation :
ceph versions
{
"mon": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"osd": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 18
},
"mds": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3
},
"rgw": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 6
},
"overall": {
"ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 33
}
}
ceph orch ps
NAME HOST STATUS REFRESHED AGE
PORTS VERSION IMAGE ID CONTAINER ID
crash.rke-sh1-1 rke-sh1-1 running (21h) 36s ago 21h -
16.2.1 c757e4a3636b e8652edb2b49
crash.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 20M -
16.2.1 c757e4a3636b a1249a605ee0
crash.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h -
16.2.1 c757e4a3636b 026667bc1776
mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (18h) 36s ago 4M -
16.2.1 c757e4a3636b 9b4c2b08b759
mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (18h) 3m ago 23M -
16.2.1 c757e4a3636b 71681a5f34d3
mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (17h) 36s ago 3M -
16.2.1 c757e4a3636b e89946ad6b7e
mgr.rke-sh1-1.qskoyj rke-sh1-1 running (21h) 36s ago 2y
*:8082 *:9283 16.2.1 c757e4a3636b 7ce7cfbb3e55
mgr.rke-sh1-2.lxmguj rke-sh1-2 running (21h) 3m ago 22M
*:8082 *:9283 16.2.1 c757e4a3636b 5a0025adfd46
mgr.rke-sh1-3.ckunvo rke-sh1-3 running (17h) 36s ago 6M
*:8082 *:9283 16.2.1 c757e4a3636b 2fcaf18f3218
mon.rke-sh1-1 rke-sh1-1 running (20h) 36s ago 20h -
16.2.1 c757e4a3636b c0a90103cabc
mon.rke-sh1-2 rke-sh1-2 running (21h) 3m ago 3M -
16.2.1 c757e4a3636b f4b32ba4466b
mon.rke-sh1-3 rke-sh1-3 running (17h) 36s ago 17h -
16.2.1 c757e4a3636b d5e44c245998
osd.0 rke-sh1-2 running (20h) 3m ago 2y -
16.2.1 c757e4a3636b 7b0e69942c15
osd.1 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 4451654d9a2d
osd.10 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 3f9d5f95e284
osd.11 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b db1cc6d2e37f
osd.12 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b de416c1ef766
osd.13 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 25a281cc5a9b
osd.14 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 62f25ba61667
osd.15 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b d3514d823c45
osd.16 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b bba857759bfe
osd.17 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 59281d4bb3d0
osd.2 rke-sh1-1 running (21h) 36s ago 2y -
16.2.1 c757e4a3636b 418041b5e60d
osd.3 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 04a0e29d5623
osd.4 rke-sh1-1 running (20h) 36s ago 2y -
16.2.1 c757e4a3636b 1cc78a5153d3
osd.5 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b 39a4b11e31fb
osd.6 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 2f218ffb566e
osd.7 rke-sh1-1 running (20h) 36s ago 2y -
16.2.1 c757e4a3636b cf761fbe4d5f
osd.8 rke-sh1-3 running (17h) 36s ago 2y -
16.2.1 c757e4a3636b f9f85480e800
osd.9 rke-sh1-2 running (21h) 3m ago 2y -
16.2.1 c757e4a3636b 664c54ff46d2
rgw.default.rke-sh1-1.dgucwl rke-sh1-1 running (21h) 36s ago 22M
*:8000 16.2.1 c757e4a3636b f03212b955a7
rgw.default.rke-sh1-1.vylchc rke-sh1-1 running (21h) 36s ago 22M
*:8001 16.2.1 c757e4a3636b da486ce43fe5
rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 running (21h) 3m ago 2y
*:8000 16.2.1 c757e4a3636b ef4089d0aef2
rgw.default.rke-sh1-2.efkbum rke-sh1-2 running (21h) 3m ago 2y
*:8001 16.2.1 c757e4a3636b 9e053d5a2f7b
rgw.default.rke-sh1-3.krfgey rke-sh1-3 running (17h) 36s ago 9M
*:8001 16.2.1 c757e4a3636b 45cd3d75edd3
rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 running (17h) 36s ago 9M
*:8000 16.2.1 c757e4a3636b e2710265a7f4
ceph health detail
HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode
[WRN] MDS_TRIM: 2 MDSs behind on trimming
mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149
mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149
root@rke-sh1-1:~# ceph fs status
cephfs - 27 clients
======
RANK STATE MDS ACTIVITY DNS INOS
DIRS CAPS
0 active cephfs.rke-sh1-2.isqjza Reqs: 8 /s 85.2k 53.2k
1742 101
0-s standby-replay cephfs.rke-sh1-1.ojmpnk Evts: 0 /s 52.2k 20.2k
1737 0
POOL TYPE USED AVAIL
cephfs_metadata metadata 1109G 6082G
cephfs_data data 8419G 6082G
STANDBY MDS
cephfs.rke-sh1-3.vdicdn
MDS version: ceph version 16.2.1
(afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)
ceph status
cluster:
id: fcb373ce-7aaa-11eb-984f-e7c6e0038e87
health: HEALTH_WARN
1 MDSs are read only
2 MDSs behind on trimming
services:
mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h)
mgr: rke-sh1-1.qskoyj(active, since 17h), standbys:
rke-sh1-2.lxmguj, rke-sh1-3.ckunvo
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 18 osds: 18 up (since 17h), 18 in (since 20h)
rgw: 6 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 849 pgs
objects: 10.10M objects, 5.3 TiB
usage: 11 TiB used, 15 TiB / 26 TiB avail
pgs: 849 active+clean
io:
client: 35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr
# ceph mds stat
cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1
up:standby
Have you got an idea on what could be my next steps to bring the
cluster healthy ?
Help will very be appreciated.
Thank a lot for your feedback.
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx