Just for posterity, we made the CephFS available again.
We walked through the disaster recovery steps where one of the steps
was to reset the journal. I was under the impression that the
specified command 'cephfs-journal-tool [--rank=N] journal reset' would
simply reset all the journals (mdlog and purge_queue), but it seems
like it doesn't. The cephfs-journal-tool help page mentions mdlog as
default:
--journal=<mdlog|purge_queue> Journal type (purge_queue means
this journal is used to queue for
purge operation,
default is mdlog, and only mdlog
support event mode)
And after Mykola (once again, thank you so much for your input)
pointed towards running the command for the purge_queue specifically,
the filesystem then got out of the read-only mode and was mountable
again. The exact command was:
cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal reset
Zitat von Eugen Block <eblock@xxxxxx>:
Hi,
I'm trying to help someone with a broken CephFS. We managed to
recover basic ceph functionality but the CephFS is still
inaccessible (currently read-only). We went through the disaster
recovery steps but to no avail. Here's a snippet from the startup
logs:
---snip---
mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512
(header had 14789452521). recovered.
mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal IO
failed: No such file or directory
mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory, force
readonly...
mds.0.cache force file system read-only
force file system read-only
---snip---
I've added the dev mailing list, maybe someone can give some advice
how to continue from here (we could try to recover with an empty
metadata pool). Or is this FS lost?
Thanks!
Eugen
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx