Re: MDS recovery with existing pools

Eugen Block <eblock@xxxxxx> · Mon, 11 Dec 2023 20:28:00 +0000

Update: apparently, we did it!
We walked through the disaster recovery steps where one of the steps  
was to reset the journal. I was under the impression that the  
specified command 'cephfs-journal-tool [--rank=N] journal reset' would  
simply reset all the journals (mdlog and purge_queue), but it seems  
like it doesn't. After Mykola (once again, thank you so much for your  
input) pointed towards running the command for the purge_queue  
specifically, the filesystem got out of the read-only mode and was  
mountable again. the exact command was:

cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal reset

We didn't have to walk through the recovery with an empty pool, which  
is nice. I have a suggestion to include the "journal inspect" command  
to the docs for both mdlog and purge_queue to understand that both  
journals might need a reset.

Thanks again, Mykola!
Eugen

Zitat von Eugen Block <eblock@xxxxxx>:

So we did walk through the advanced recovery page but didn't really  
succeed. The CephFS is still going to readonly because of the  
purge_queue error. Is there any chance to recover from that or  
should we try to recover with an empty metadata pool next?
I'd still appreciate any comments. ;-)

Zitat von Eugen Block <eblock@xxxxxx>:

Some more information on the damaged CephFS, apparently the journal  
is damaged:

---snip---
# cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect

2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4

2023-12-08T15:35:22.938+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f140067f) at 0x149f1174595

2023-12-08T15:35:22.942+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1400e66) at 0x149f1174d7c

2023-12-08T15:35:22.954+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1401642) at 0x149f1175558

2023-12-08T15:35:22.970+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1401e29) at 0x149f1175d3f

2023-12-08T15:35:22.974+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1402610) at 0x149f1176526

2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527ca

2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527cb

2023-12-08T15:35:22.994+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f30008f4) at 0x149f2d7480a

2023-12-08T15:35:22.998+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f3000ced) at 0x149f2d74c03

Overall journal integrity: DAMAGED
Objects missing:
 0x527c4
 0x527ca
 0x527cb
Corrupt regions:
 0x149f0d73f16-149f1174595
 0x149f1174595-149f1174d7c
 0x149f1174d7c-149f1175558
 0x149f1175558-149f1175d3f
 0x149f1175d3f-149f1176526
 0x149f1176526-149f2d7480a
 0x149f2d7480a-149f2d74c03
 0x149f2d74c03-ffffffffffffffff

# cephfs-journal-tool --rank=storage:0 --journal=purge_queue journal inspect

2023-12-08T15:35:57.691+0200 7f331621e0c0 -1 Missing object 500.00000dc6

Overall journal integrity: DAMAGED
Objects missing:
 0xdc6
Corrupt regions:
 0x3718522e9-ffffffffffffffff
---snip---

A backup isn't possible:

---snip---
# cephfs-journal-tool --rank=storage:0 journal export backup.bin
2023-12-08T15:42:07.643+0200 7fde6a24f0c0 -1 Missing object 200.000527c4

2023-12-08T15:42:07.659+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f140067f) at 0x149f1174595

2023-12-08T15:42:07.667+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1400e66) at 0x149f1174d7c

2023-12-08T15:42:07.675+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1401642) at 0x149f1175558

2023-12-08T15:42:07.687+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1401e29) at 0x149f1175d3f

2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1402610) at 0x149f1176526

2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527ca

2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527cb

2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f30008f4) at 0x149f2d7480a

2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f3000ced) at 0x149f2d74c03

2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 journal_export:  
Journal not readable, attempt object-by-object dump with `rados`

Error ((5) Input/output error)
---snip---

Does it make sense to continue with the advanced disaster recovery  
[3] bei running (all of) these steps:

cephfs-journal-tool event recover_dentries summary
cephfs-journal-tool [--rank=N] journal reset
cephfs-table-tool all reset session
ceph fs reset <fs name> --yes-i-really-mean-it
cephfs-table-tool 0 reset session
cephfs-table-tool 0 reset snap
cephfs-table-tool 0 reset inode
cephfs-journal-tool --rank=0 journal reset
cephfs-data-scan init

Fortunately, I didn't have to run through this procedure too often,  
so I'd appreciate any comments what the best approach would be here.

Thanks!
Eugen

[3]  
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts

Zitat von Eugen Block <eblock@xxxxxx>:

I was able to (almost) reproduce the issue in a (Pacific) test  
cluster. I rebuilt the monmap from the OSDs, brought everything  
back up, started the mds recovery like described in [1]:

ceph fs new <fs_name> <metadata_pool> <data_pool> --force --recover

Then I added two mds daemons which went into standby:

---snip---
Started Ceph mds.cephfs.pacific.uexvvq for  
1b0afda4-2221-11ee-87be-fa163eed040c.
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+0000 7ff5f589b900  0 set uid:gid to  
167:167 (ceph:ceph)
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+0000 7ff5f589b900  0 ceph version 16.2.14  
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable),  
process ceph-md>
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+0000 7ff5f589b900  1 main not setting numa  
affinity
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+0000 7ff5f589b900  0 pidfile_write: ignore  
empty --pid-file
Dez 08 12:51:53 pacific conmon[100493]: starting  
mds.cephfs.pacific.uexvvq at
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.102+0000 7ff5e37be700  1  
mds.cephfs.pacific.uexvvq Updating MDS map to version 2 from mon.0
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.802+0000 7ff5e37be700  1  
mds.cephfs.pacific.uexvvq Updating MDS map to version 3 from mon.0
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.802+0000 7ff5e37be700  1  
mds.cephfs.pacific.uexvvq Monitors have assigned me to become a  
standby.
---snip---

But as soon as I ran

pacific:~ # ceph fs set cephfs joinable true
cephfs marked joinable; MDS may join as newly active.

one MDS daemon became active and the FS is available now. So  
apparently the "Advanced" steps from [2] usually weren't  
necessary, but are they in this case? I'm still trying to find an  
explanation for the purge_queue errors.

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

following up on the previous thread (After hardware failure tried  
to recover ceph and followed instructions for recovery using  
OSDS), we were able to get ceph back into a healthy state  
(including the unfound object). Now the CephFS needs to be  
recovered and I'm having trouble to fully understand the docs [1]  
which the next steps would be. We ran the following which  
according to [1] sets the state to existing but failed:

ceph fs new <fs_name> <metadata_pool> <data_pool> --force --recover

But how to continue from here? Should we expect an active MDS at  
this point or not? Because the "ceph fs status" output still  
shows rank 0 as failed. We then tried:

ceph fs set <fs_name> joinable true

But apparently it was already joinable, nothing changed. Before  
doing anything (destructive) from the advanced options [2] I  
wanted to ask the community, how to get on from here. I pasted  
the mds logs at the bottom, I'm not really sure if the current  
state is expected or not. Apparently, the journal recovers but  
the purge_queue does not:

mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.
mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal  
IO failed: No such file or directory
mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory,  
force readonly...
mds.0.cache force file system read-only
force file system read-only

Is this expected because the "--recover" flag prevents an active  
MDS or not? Before running "ceph mds rmfailed ..." and/or "ceph  
fs reset <file system name>" with the --yes-i-really-mean-it flag  
I'd like to ask for your input. In which case should we run those  
commands? The docs are not really clear to me. Any input is  
highly appreciated!

Thanks!
Eugen

[1]  
https://docs.ceph.com/en/latest/cephfs/recover-fs-after-mon-store-loss/
[2]  
https://docs.ceph.com/en/latest/cephfs/administration/#advanced-cephfs-admin-settings

---snip---
Dec 07 15:35:48 node02 bash[692598]: debug    -90>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  1  
mds.storage.node02.hemalk Updating MDS map to version 41 from mon.0
Dec 07 15:35:48 node02 bash[692598]: debug    -89>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug    -88>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  5 asok(0x55c27fe86000)  
register_command objecter_requests hook 0x55c27fe16310
Dec 07 15:35:48 node02 bash[692598]: debug    -87>  
2023-12-07T13:35:47.730+0000 7f4cd855f700 10 monclient: _renew_subs
Dec 07 15:35:48 node02 bash[692598]: debug    -86>  
2023-12-07T13:35:47.730+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -85>  
2023-12-07T13:35:47.730+0000 7f4cd855f700 10 log_channel(cluster)  
update_config to_monitors: true to_syslog: false syslog_facility:  
 prio: info to_graylog: false graylog_host: 127.0.0.1  
graylog_port: 12201)
Dec 07 15:35:48 node02 bash[692598]: debug    -84>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug    -83>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  4 mds.0.0  
apply_blocklist: killed 0, blocklisted sessions (0 blocklist  
entries, 0)
Dec 07 15:35:48 node02 bash[692598]: debug    -82>  
2023-12-07T13:35:47.730+0000 7f4cd855f700  1 mds.0.41  
handle_mds_map i am now mds.0.41
Dec 07 15:35:48 node02 bash[692598]: debug    -81>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  1 mds.0.41  
handle_mds_map state change up:standby --> up:replay
Dec 07 15:35:48 node02 bash[692598]: debug    -80>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  5  
mds.beacon.storage.node02.hemalk set_want_state: up:standby ->  
up:replay
Dec 07 15:35:48 node02 bash[692598]: debug    -79>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  1 mds.0.41 replay_start
Dec 07 15:35:48 node02 bash[692598]: debug    -78>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening inotable
Dec 07 15:35:48 node02 bash[692598]: debug    -77>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -76>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening sessionmap
Dec 07 15:35:48 node02 bash[692598]: debug    -75>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -74>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening mds log
Dec 07 15:35:48 node02 bash[692598]: debug    -73>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  5 mds.0.log open  
discovering log bounds
Dec 07 15:35:48 node02 bash[692598]: debug    -72>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening purge queue (async)
Dec 07 15:35:48 node02 bash[692598]: debug    -71>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  4 mds.0.purge_queue  
open: opening
Dec 07 15:35:48 node02 bash[692598]: debug    -70>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  1  
mds.0.journaler.pq(ro) recover start
Dec 07 15:35:48 node02 bash[692598]: debug    -69>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  1  
mds.0.journaler.pq(ro) read_head
Dec 07 15:35:48 node02 bash[692598]: debug    -68>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -67>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
loading open file table (async)
Dec 07 15:35:48 node02 bash[692598]: debug    -66>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -65>  
2023-12-07T13:35:47.734+0000 7f4cd855f700  2 mds.0.41 Booting: 0:  
opening snap table
Dec 07 15:35:48 node02 bash[692598]: debug    -64>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -63>  
2023-12-07T13:35:47.734+0000 7f4cd1d52700  4 mds.0.journalpointer  
Reading journal pointer '400.00000000'
Dec 07 15:35:48 node02 bash[692598]: debug    -62>  
2023-12-07T13:35:47.734+0000 7f4cd1d52700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -61>  
2023-12-07T13:35:47.734+0000 7f4cd4557700  2 mds.0.cache Memory  
usage:  total 316452, rss 43088, heap 198940, baseline 198940, 0  
/ 0 inodes have caps, 0 caps, 0 caps per inode
Dec 07 15:35:48 node02 bash[692598]: debug    -60>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: _renew_subs
Dec 07 15:35:48 node02 bash[692598]: debug    -59>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
_send_mon_message to mon.node02 at v2:10.40.99.12:3300/0
Dec 07 15:35:48 node02 bash[692598]: debug    -58>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 1 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -57>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 2 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -56>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 3 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -55>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 4 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -54>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 5 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -53>  
2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient:  
handle_get_version_reply finishing 6 version 10835
Dec 07 15:35:48 node02 bash[692598]: debug    -52>  
2023-12-07T13:35:47.734+0000 7f4cdb565700 10 monclient:  
get_auth_request con 0x55c280bc5800 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -51>  
2023-12-07T13:35:47.734+0000 7f4cdbd66700 10 monclient:  
get_auth_request con 0x55c280dc6800 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -50>  
2023-12-07T13:35:47.734+0000 7f4cdad64700 10 monclient:  
get_auth_request con 0x55c280dc7800 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -49>  
2023-12-07T13:35:47.734+0000 7f4cd3555700  1  
mds.0.journaler.pq(ro) _finish_read_head loghead(trim  
14789115904, expire 14789452521, write 14789452521, stream_format  
1).  probing for end of log (from 14789452521)...
Dec 07 15:35:48 node02 bash[692598]: debug    -48>  
2023-12-07T13:35:47.734+0000 7f4cd3555700  1  
mds.0.journaler.pq(ro) probing for end of the log
Dec 07 15:35:48 node02 bash[692598]: debug    -47>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  1  
mds.0.journaler.mdlog(ro) recover start
Dec 07 15:35:48 node02 bash[692598]: debug    -46>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  1  
mds.0.journaler.mdlog(ro) read_head
Dec 07 15:35:48 node02 bash[692598]: debug    -45>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  4 mds.0.log Waiting  
for journal 0x200 to recover...
Dec 07 15:35:48 node02 bash[692598]: debug    -44>  
2023-12-07T13:35:47.738+0000 7f4cdbd66700 10 monclient:  
get_auth_request con 0x55c280dc7c00 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -43>  
2023-12-07T13:35:47.738+0000 7f4cd2553700  1  
mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim  
1416940748800, expire 1416947000701, write 1417125359769,  
stream_format 1).  probing for end of log (from 1417125359769)...
Dec 07 15:35:48 node02 bash[692598]: debug    -42>  
2023-12-07T13:35:47.738+0000 7f4cd2553700  1  
mds.0.journaler.mdlog(ro) probing for end of the log
Dec 07 15:35:48 node02 bash[692598]: debug    -41>  
2023-12-07T13:35:47.738+0000 7f4cdb565700 10 monclient:  
get_auth_request con 0x55c280e2fc00 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -40>  
2023-12-07T13:35:47.738+0000 7f4cdad64700 10 monclient:  
get_auth_request con 0x55c280ee0400 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -39>  
2023-12-07T13:35:47.738+0000 7f4cd2553700  1  
mds.0.journaler.mdlog(ro) _finish_probe_end write_pos =  
1417129492480 (header had 1417125359769). recovered.
Dec 07 15:35:48 node02 bash[692598]: debug    -38>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  4 mds.0.log Journal  
0x200 recovered.
Dec 07 15:35:48 node02 bash[692598]: debug    -37>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  4 mds.0.log Recovered  
journal 0x200 in format 1
Dec 07 15:35:48 node02 bash[692598]: debug    -36>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  2 mds.0.41 Booting: 1:  
loading/discovering base inodes
Dec 07 15:35:48 node02 bash[692598]: debug    -35>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  0 mds.0.cache creating  
system inode with ino:0x100
Dec 07 15:35:48 node02 bash[692598]: debug    -34>  
2023-12-07T13:35:47.738+0000 7f4cd1d52700  0 mds.0.cache creating  
system inode with ino:0x1
Dec 07 15:35:48 node02 bash[692598]: debug    -33>  
2023-12-07T13:35:47.742+0000 7f4cdbd66700 10 monclient:  
get_auth_request con 0x55c280dc7400 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -32>  
2023-12-07T13:35:47.742+0000 7f4cd2553700  2 mds.0.41 Booting: 2:  
replaying mds log
Dec 07 15:35:48 node02 bash[692598]: debug    -31>  
2023-12-07T13:35:47.742+0000 7f4cd2553700  2 mds.0.41 Booting: 2:  
waiting for purge queue recovered
Dec 07 15:35:48 node02 bash[692598]: debug    -30>  
2023-12-07T13:35:47.742+0000 7f4cd3555700  1  
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.
Dec 07 15:35:48 node02 bash[692598]: debug    -29>  
2023-12-07T13:35:47.742+0000 7f4cd3555700  4 mds.0.purge_queue  
operator(): open complete
Dec 07 15:35:48 node02 bash[692598]: debug    -28>  
2023-12-07T13:35:47.742+0000 7f4cd3555700  4 mds.0.purge_queue  
operator(): recovering write_pos
Dec 07 15:35:48 node02 bash[692598]: debug    -27>  
2023-12-07T13:35:47.742+0000 7f4cdb565700 10 monclient:  
get_auth_request con 0x55c280bc5c00 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -26>  
2023-12-07T13:35:47.742+0000 7f4cdad64700 10 monclient:  
get_auth_request con 0x55c280ee0c00 auth_method 0
Dec 07 15:35:48 node02 bash[692598]: debug    -25>  
2023-12-07T13:35:47.746+0000 7f4cd3555700  0  
mds.0.journaler.pq(ro) _finish_read got error -2
Dec 07 15:35:48 node02 bash[692598]: debug    -24>  
2023-12-07T13:35:47.746+0000 7f4cd3555700 -1 mds.0.purge_queue  
_recover: Error -2 recovering write_pos
Dec 07 15:35:48 node02 bash[692598]: debug    -23>  
2023-12-07T13:35:47.746+0000 7f4cd3555700  1 mds.0.purge_queue  
_go_readonly: going readonly because internal IO failed: No such  
file or directory
Dec 07 15:35:48 node02 bash[692598]: debug    -22>  
2023-12-07T13:35:47.746+0000 7f4cd3555700  1  
mds.0.journaler.pq(ro) set_readonly
Dec 07 15:35:48 node02 bash[692598]: debug    -21>  
2023-12-07T13:35:47.746+0000 7f4cd3555700 -1 mds.0.41 unhandled  
write error (2) No such file or directory, force readonly...
Dec 07 15:35:48 node02 bash[692598]: debug    -20>  
2023-12-07T13:35:47.746+0000 7f4cd3555700  1 mds.0.cache force  
file system read-only
Dec 07 15:35:48 node02 bash[692598]: debug    -19>  
2023-12-07T13:35:47.746+0000 7f4cd3555700  0 log_channel(cluster)  
log [WRN] : force file system read-only
---snip---

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx