Re: Rebuilding/recreating CephFS journal?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 27 May 2016 14:01:23 -0700

On Fri, May 27, 2016 at 1:54 PM, Stillwell, Bryan J
<Bryan.Stillwell@xxxxxxxxxxx> wrote:
> On 5/27/16, 11:27 AM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
>
>>On Fri, May 27, 2016 at 9:44 AM, Stillwell, Bryan J
>><Bryan.Stillwell@xxxxxxxxxxx> wrote:
>>> I have a Ceph cluster at home that I¹ve been running CephFS on for the
>>> last few years.  Recently my MDS server became damaged and while
>>> attempting to fix it I believe I¹ve destroyed by CephFS journal based
>>>off
>>> this:
>>>
>>> 2016-05-25 16:48:23.882095 7f8d2fac2700 -1 log_channel(cluster) log
>>>[ERR]
>>> : Error recovering journal 200: (2) No such file or directory
>>>
>>> As far as I can tell the data and metadata are still in tact, so I¹m
>>> wondering if there¹s a way to rebuild the cephfs journal or if that¹s
>>>not
>>> possible, a way to start extracting the data?
>>
>>Check out http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
>>
>>You'll want to make sure you've actually lost the whole journal (how
>>did you manage that?!?!), reset it, and quite possibly run the data
>>scan tools. Be careful!
>
> So I actually got into this mess by following that page and not being as
> careful as I should have been.
>
> I started off by trying to backup the journal, but it failed for this
> reason:
>
> # cephfs-journal-tool journal export backup.bin
> 2016-05-25 15:25:26.541767 7f2932ee5bc0 -1 Missing object 200.00000197
> 2016-05-25 15:25:26.543896 7f2932ee5bc0 -1 journal_export: Journal not
> readable, attempt object-by-object dump with `rados`
> Error ((5) Input/output error)
>
>
>
> I took a look at http://tracker.ceph.com/issues/9902, but scanning that
> page I didn't see a way to do an object-by-object dump.
>
> Now if I attempt to export the journal I get:
>
> # cephfs-journal-tool journal export backup.bin
> Error ((5) Input/output error)2016-05-27 14:19:49.807482 7f06fa378bc0 -1
> Header 200.00000000 is unreadable
>
> 2016-05-27 14:19:49.807491 7f06fa378bc0 -1 journal_export: Journal not
> readable, attempt object-by-object dump with `rados`
>
>
>
>
> I believe the 'Missing object 200.00000197' error had something to do with
> this problem that I was trying to deal with:
>
> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/29844
>

Okay, so you lost an object

>
>
> The missing object was probably caused by being a little too aggressive
> with running mark_unfound_lost.
>
>
> Anyways, I continued on with the disaster recovery steps without making a
> backup first.  The next step identified the missing object again:
>
> # cephfs-journal-tool event recover_dentries summary
> 2016-05-25 15:36:35.455989 7fa37b8b1bc0 -1 Missing object 200.00000197
> Events by type:
>   OPEN: 12548
>   SESSION: 24
>   SUBTREEMAP: 29
>   UPDATE: 12254
> Errors: 0

Then you recovered the dentries out of the journal, so everything in
the backing rados object should be up to date (unless you lost
something in that missing object, which can't be helped now).

>
>
>
> I then tried truncating the journal:
>
> # cephfs-journal-tool journal reset
> old journal was 1666720764~48749572
> new journal start will be 1719664640 (4194304 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done

and then you reset the journal, fine

>
>
>
> Reset the session map:
>
> # cephfs-table-tool all reset session
> {
>     "0": {
>         "data": {},
>         "result": 0
>     }
> }

and wiped out client connections, so if you have any (still) left
running you'll need to restart them

>
>
>
> And then because I was still having problems starting the MDS I ran:
>
> # ceph fs reset cephfs --yes-i-really-mean-it

...and then you reset a bunch of the MDS' metadata; I don't actually
remember what all this does.

>
>
> That's when I believe Header 200.00000000 went missing (I could be wrong,
> I don't have good notes around this part).

Yeah, that's probably part of what the reset command above did.

>
> So would the next steps be to run the following commands?:
>
> cephfs-table-tool 0 reset session
> cephfs-table-tool 0 reset snap
> cephfs-table-tool 0 reset inode
> cephfs-journal-tool --rank=0 journal reset
> cephfs-data-scan init
>
> cephfs-data-scan scan_extents data
> cephfs-data-scan scan_inodes data

No, definitely not. I think you just need to reset the journal again,
since you wiped out a bunch of its data with that fs reset command.
Since your backing data should already be consistent you don't need to
do any data scans. Your snap and inode tables might be corrupt,
but...hopefully not. If they are busted...actually, I don't remember;
maybe you will need to run the data scan tooling to repair those. I'd
try to avoid it if possible just because of the time involved. (It'll
become obvious pretty quickly if the inode tables are no good.)
-Greg

>
>
>
> Thanks,
> Bryan
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com