On Tue, Oct 11, 2016 at 2:30 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote: > On 16-10-11 14:30, John Spray wrote: >> >> On Tue, Oct 11, 2016 at 12:00 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote: >>> >>> Hey, >>> >>> After a bright idea to pause 10.2.2 Ceph cluster for a minute to see if >>> it >>> will speed up backfill I managed to corrupt my MDS journal (should it >>> happen >>> after cluster pause/unpause, or is it some sort of a bug?). I had >>> "Overall >>> journal integrity: DAMAGED", etc >> >> Uh, pause/unpausing your RADOS cluster should never do anything apart >> from pausing IO. That's DEFINITELY a severe bug if it corrupted >> objects! > > I am digging into logs now, I'll try to collect what I can and create a bug > report. One more thought on this: if you seem to have encountered corruption then it is a good idea to do a deep scrub and see if that complains about anything. John >>> >>> I was following http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/ >>> and have some questions/feedback: >> >> Caveat: This is a difficult area to document, because the repair tools >> interfere with internal on-disk structures. If I can use a bad >> metaphor: it's like being in an auto garage, and asking for >> documentation about the tools -- the manual for the wrench doesn't >> tell you anything about how to fix the car engine. Similarly it's >> hard to write useful documentation about the repair tools without also >> writing a detailed manual for how all the cephfs internals work. >> > Some notes/links still would be useful for newcomers. It's like someone > standing at the side of the road with broken car and a wrench. I could try > fixing it with what I had or just nuke it and get myself a new car :) (data > was kind of expendable there) > >>> * It would be great to have some info when ‘snap’ or ‘inode’ should be >>> reset >> >> You would reset these tables if you knew that for some reason they no >> longer matched the reality elsewhere in the metadata. >> >>> * It is not clear when MDS start should be attempted >> >> You would start the MDS when you believed that you had done all you >> could with offline repair. Everything on the "disaster recovery" page >> is about offline tools. >> >>> * Can scan_extents/scan_inodes be run after MDS is running? >> >> These are meant only for offline use. You could in principle run >> scan_extents while an MDS was running as long as you had no data >> writes going on. scan_inodes writes directly into the metadata pool >> so is certainly not safe to run at the same time as an active MDS. >> >>> * "online MDS scrub" is mentioned in docs. Is it scan_extents/scan_inodes >>> or >>> some other command? >> >> That refers to the "forward scrub" functionality inside the MDS, >> that's invoked with "scrub_path" or "tag path" commands. >> >>> Now CephFS seems to be working (I have "mds0: Metadata damage detected" >>> but >>> scan_extends is currently running), let's see what happens when I finish >>> scan_extends/scan_inodes. >>> >>> Will these actions solve possible orphaned objects in pools? What else >>> should I look into? >> >> A full offline scan_extents/scan_inodes run should re-link orphans >> into a top-level lost+found directory (from which you can subsequently >> delete them when your MDS is back online). >> >> John >> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com