Re: Feedback on docs after MDS damage/journal corruption

John Spray <jspray@xxxxxxxxxx> · Tue, 11 Oct 2016 18:20:49 +0100

On Tue, Oct 11, 2016 at 2:30 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
> On 16-10-11 14:30, John Spray wrote:
>>
>> On Tue, Oct 11, 2016 at 12:00 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
>>>
>>> Hey,
>>>
>>> After a bright idea to pause 10.2.2 Ceph cluster for a minute to see if
>>> it
>>> will speed up backfill I managed to corrupt my MDS journal (should it
>>> happen
>>> after cluster pause/unpause, or is it some sort of a bug?). I had
>>> "Overall
>>> journal integrity: DAMAGED", etc
>>
>> Uh, pause/unpausing your RADOS cluster should never do anything apart
>> from pausing IO.  That's DEFINITELY a severe bug if it corrupted
>> objects!
>
> I am digging into logs now, I'll try to collect what I can and create a bug
> report.

One more thought on this: if you seem to have encountered corruption
then it is a good idea to do a deep scrub and see if that complains
about anything.

John

>>>
>>> I was following http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
>>> and have some questions/feedback:
>>
>> Caveat: This is a difficult area to document, because the repair tools
>> interfere with internal on-disk structures.  If I can use a bad
>> metaphor: it's like being in an auto garage, and asking for
>> documentation about the tools -- the manual for the wrench doesn't
>> tell you anything about how to fix the car engine.  Similarly it's
>> hard to write useful documentation about the repair tools without also
>> writing a detailed manual for how all the cephfs internals work.
>>
> Some notes/links still would be useful for newcomers. It's like someone
> standing at the side of the road with broken car and a wrench. I could try
> fixing it with what I had or just nuke it and get myself a new car :) (data
> was kind of expendable there)
>
>>> * It would be great to have some info when ‘snap’ or ‘inode’ should be
>>> reset
>>
>> You would reset these tables if you knew that for some reason they no
>> longer matched the reality elsewhere in the metadata.
>>
>>> * It is not clear when MDS start should be attempted
>>
>> You would start the MDS when you believed that you had done all you
>> could with offline repair.  Everything on the "disaster recovery" page
>> is about offline tools.
>>
>>> * Can scan_extents/scan_inodes be run after MDS is running?
>>
>> These are meant only for offline use.  You could in principle run
>> scan_extents while an MDS was running as long as you had no data
>> writes going on.  scan_inodes writes directly into the metadata pool
>> so is certainly not safe to run at the same time as an active MDS.
>>
>>> * "online MDS scrub" is mentioned in docs. Is it scan_extents/scan_inodes
>>> or
>>> some other command?
>>
>> That refers to the "forward scrub" functionality inside the MDS,
>> that's invoked with "scrub_path" or "tag path" commands.
>>
>>> Now CephFS seems to be working (I have "mds0: Metadata damage detected"
>>> but
>>> scan_extends is currently running), let's see what happens when I finish
>>> scan_extends/scan_inodes.
>>>
>>> Will these actions solve possible orphaned objects in pools? What else
>>> should I look into?
>>
>> A full offline scan_extents/scan_inodes run should re-link orphans
>> into a top-level lost+found directory (from which you can subsequently
>> delete them when your MDS is back online).
>>
>> John
>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com