Re: Rebuilding/recreating CephFS journal?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 27 May 2016 14:23:48 -0700

On Fri, May 27, 2016 at 2:22 PM, Stillwell, Bryan J
<Bryan.Stillwell@xxxxxxxxxxx> wrote:
> Here's the full 'ceph -s' output:
>
> # ceph -s
>     cluster c7ba6111-e0d6-40e8-b0af-8428e8702df9
>      health HEALTH_ERR
>             mds rank 0 is damaged
>             mds cluster is degraded
>      monmap e5: 3 mons at
> {b3=172.24.88.53:6789/0,b4=172.24.88.54:6789/0,lira=172.24.88.20:6789/0}
>             election epoch 320, quorum 0,1,2 lira,b3,b4
>       fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged
>      osdmap e35262: 21 osds: 21 up, 21 in
>             flags sortbitwise
>       pgmap v10096597: 480 pgs, 4 pools, 23718 GB data, 5951 kobjects
>             35758 GB used, 11358 GB / 47116 GB avail
>                  479 active+clean
>                    1 active+clean+scrubbing+deep

Yeah, you should just need to mark mds 0 as repaired at this point.

>
>
>
>
>
>
> On 5/27/16, 3:17 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
>
>>What's the current full output of "ceph -s"?
>>
>>If you already had your MDS in damaged state, you might just need to
>>mark it as repaired. That's a monitor command.
>>
>>On Fri, May 27, 2016 at 2:09 PM, Stillwell, Bryan J
>><Bryan.Stillwell@xxxxxxxxxxx> wrote:
>>> On 5/27/16, 3:01 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
>>>
>>>>>
>>>>> So would the next steps be to run the following commands?:
>>>>>
>>>>> cephfs-table-tool 0 reset session
>>>>> cephfs-table-tool 0 reset snap
>>>>> cephfs-table-tool 0 reset inode
>>>>> cephfs-journal-tool --rank=0 journal reset
>>>>> cephfs-data-scan init
>>>>>
>>>>> cephfs-data-scan scan_extents data
>>>>> cephfs-data-scan scan_inodes data
>>>>
>>>>No, definitely not. I think you just need to reset the journal again,
>>>>since you wiped out a bunch of its data with that fs reset command.
>>>>Since your backing data should already be consistent you don't need to
>>>>do any data scans. Your snap and inode tables might be corrupt,
>>>>but...hopefully not. If they are busted...actually, I don't remember;
>>>>maybe you will need to run the data scan tooling to repair those. I'd
>>>>try to avoid it if possible just because of the time involved. (It'll
>>>>become obvious pretty quickly if the inode tables are no good.)
>>>
>>> So when I attempt to reset the journal again I get this:
>>>
>>> # cephfs-journal-tool journal reset
>>> journal does not exist on-disk. Did you set a bad rank?2016-05-27
>>> 15:03:30.016326 7f63f987e700  0 client.20626476.journaler(ro) error
>>> getting journal off disk
>>>
>>> Error loading journal: (2) No such file or directory, pass --force to
>>> forcibly reset this journal
>>> Error ((2) No such file or directory)
>>>
>>>
>>>
>>> And then I tried to force it which seemed to succeed:
>>>
>>> # cephfs-journal-tool journal reset --force
>>> writing EResetJournal entry
>>>
>>>
>>>
>>> However, when I restart the mds it gets stuck in standby mode:
>>>
>>> 2016-05-27 15:05:57.080672 7fe0cccd8700 -1 mds.b4 *** got signal
>>> Terminated ***
>>> 2016-05-27 15:05:57.080703 7fe0cccd8700  1 mds.b4 suicide.  wanted state
>>> up:standby
>>> 2016-05-27 15:06:04.527203 7f500f28a180  0 set uid:gid to 64045:64045
>>> (ceph:ceph)
>>> 2016-05-27 15:06:04.527259 7f500f28a180  0 ceph version 10.2.0
>>> (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mds, pid 19163
>>> 2016-05-27 15:06:04.527569 7f500f28a180  0 pidfile_write: ignore empty
>>> --pid-file
>>> 2016-05-27 15:06:04.637842 7f5008a04700  1 mds.b4 handle_mds_map standby
>>>
>>>
>>>
>>> The relevant output from 'ceph -s' looks like this:
>>>
>>>       fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged
>>>
>>>
>>> What am I missing?
>>>
>>> Thanks,
>>> Bryan
>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com