Re: mds.0.journaler.pq(ro) _finish_read got error -2 [solved]

Eugen Block <eblock@xxxxxx> · Tue, 12 Dec 2023 07:56:47 +0000

Just for posterity, we made the CephFS available again.

We walked through the disaster recovery steps where one of the steps  
was to reset the journal. I was under the impression that the  
specified command 'cephfs-journal-tool [--rank=N] journal reset' would  
simply reset all the journals (mdlog and purge_queue), but it seems  
like it doesn't. The cephfs-journal-tool help page mentions mdlog as  
default:

--journal=<mdlog|purge_queue>  Journal type (purge_queue means
                                 this journal is used to queue for  
purge operation,
                                 default is mdlog, and only mdlog  
support event mode)

And after Mykola (once again, thank you so much for your input)  
pointed towards running the command for the purge_queue specifically,  
the filesystem then got out of the read-only mode and was mountable  
again. The exact command was:

cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal reset

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

I'm trying to help someone with a broken CephFS. We managed to  
recover basic ceph functionality but the CephFS is still  
inaccessible (currently read-only). We went through the disaster  
recovery steps but to no avail. Here's a snippet from the startup  
logs:

---snip---
mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.
mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal IO  
failed: No such file or directory
mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory, force  
readonly...
mds.0.cache force file system read-only
force file system read-only
---snip---

I've added the dev mailing list, maybe someone can give some advice  
how to continue from here (we could try to recover with an empty  
metadata pool). Or is this FS lost?

Thanks!
Eugen

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx