Re: Feedback on docs after MDS damage/journal corruption

Henrik Korkuc <lists@xxxxxxxxx> · Tue, 11 Oct 2016 16:30:06 +0300

On 16-10-11 14:30, John Spray wrote:
On Tue, Oct 11, 2016 at 12:00 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
Hey,

After a bright idea to pause 10.2.2 Ceph cluster for a minute to see if it
will speed up backfill I managed to corrupt my MDS journal (should it happen
after cluster pause/unpause, or is it some sort of a bug?). I had "Overall
journal integrity: DAMAGED", etc
Uh, pause/unpausing your RADOS cluster should never do anything apart
from pausing IO.  That's DEFINITELY a severe bug if it corrupted
objects!
I am digging into logs now, I'll try to collect what I can and create a 
bug report.
I was following http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
and have some questions/feedback:
Caveat: This is a difficult area to document, because the repair tools
interfere with internal on-disk structures.  If I can use a bad
metaphor: it's like being in an auto garage, and asking for
documentation about the tools -- the manual for the wrench doesn't
tell you anything about how to fix the car engine.  Similarly it's
hard to write useful documentation about the repair tools without also
writing a detailed manual for how all the cephfs internals work.

Some notes/links still would be useful for newcomers. It's like someone 
standing at the side of the road with broken car and a wrench. I could 
try fixing it with what I had or just nuke it and get myself a new car 
:) (data was kind of expendable there)
* It would be great to have some info when ‘snap’ or ‘inode’ should be reset
You would reset these tables if you knew that for some reason they no
longer matched the reality elsewhere in the metadata.

* It is not clear when MDS start should be attempted
You would start the MDS when you believed that you had done all you
could with offline repair.  Everything on the "disaster recovery" page
is about offline tools.

* Can scan_extents/scan_inodes be run after MDS is running?
These are meant only for offline use.  You could in principle run
scan_extents while an MDS was running as long as you had no data
writes going on.  scan_inodes writes directly into the metadata pool
so is certainly not safe to run at the same time as an active MDS.

* "online MDS scrub" is mentioned in docs. Is it scan_extents/scan_inodes or
some other command?
That refers to the "forward scrub" functionality inside the MDS,
that's invoked with "scrub_path" or "tag path" commands.

Now CephFS seems to be working (I have "mds0: Metadata damage detected" but
scan_extends is currently running), let's see what happens when I finish
scan_extends/scan_inodes.

Will these actions solve possible orphaned objects in pools? What else
should I look into?
A full offline scan_extents/scan_inodes run should re-link orphans
into a top-level lost+found directory (from which you can subsequently
delete them when your MDS is back online).

John

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com