Re: Rebuilding/recreating CephFS journal?

"Stillwell, Bryan J" <Bryan.Stillwell@xxxxxxxxxxx> · Fri, 27 May 2016 21:22:50 +0000

Here's the full 'ceph -s' output:

# ceph -s
    cluster c7ba6111-e0d6-40e8-b0af-8428e8702df9
     health HEALTH_ERR
            mds rank 0 is damaged
            mds cluster is degraded
     monmap e5: 3 mons at
{b3=172.24.88.53:6789/0,b4=172.24.88.54:6789/0,lira=172.24.88.20:6789/0}
            election epoch 320, quorum 0,1,2 lira,b3,b4
      fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged
     osdmap e35262: 21 osds: 21 up, 21 in
            flags sortbitwise
      pgmap v10096597: 480 pgs, 4 pools, 23718 GB data, 5951 kobjects
            35758 GB used, 11358 GB / 47116 GB avail
                 479 active+clean
                   1 active+clean+scrubbing+deep

On 5/27/16, 3:17 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:

>What's the current full output of "ceph -s"?
>
>If you already had your MDS in damaged state, you might just need to
>mark it as repaired. That's a monitor command.
>
>On Fri, May 27, 2016 at 2:09 PM, Stillwell, Bryan J
><Bryan.Stillwell@xxxxxxxxxxx> wrote:
>> On 5/27/16, 3:01 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
>>
>>>>
>>>> So would the next steps be to run the following commands?:
>>>>
>>>> cephfs-table-tool 0 reset session
>>>> cephfs-table-tool 0 reset snap
>>>> cephfs-table-tool 0 reset inode
>>>> cephfs-journal-tool --rank=0 journal reset
>>>> cephfs-data-scan init
>>>>
>>>> cephfs-data-scan scan_extents data
>>>> cephfs-data-scan scan_inodes data
>>>
>>>No, definitely not. I think you just need to reset the journal again,
>>>since you wiped out a bunch of its data with that fs reset command.
>>>Since your backing data should already be consistent you don't need to
>>>do any data scans. Your snap and inode tables might be corrupt,
>>>but...hopefully not. If they are busted...actually, I don't remember;
>>>maybe you will need to run the data scan tooling to repair those. I'd
>>>try to avoid it if possible just because of the time involved. (It'll
>>>become obvious pretty quickly if the inode tables are no good.)
>>
>> So when I attempt to reset the journal again I get this:
>>
>> # cephfs-journal-tool journal reset
>> journal does not exist on-disk. Did you set a bad rank?2016-05-27
>> 15:03:30.016326 7f63f987e700  0 client.20626476.journaler(ro) error
>> getting journal off disk
>>
>> Error loading journal: (2) No such file or directory, pass --force to
>> forcibly reset this journal
>> Error ((2) No such file or directory)
>>
>>
>>
>> And then I tried to force it which seemed to succeed:
>>
>> # cephfs-journal-tool journal reset --force
>> writing EResetJournal entry
>>
>>
>>
>> However, when I restart the mds it gets stuck in standby mode:
>>
>> 2016-05-27 15:05:57.080672 7fe0cccd8700 -1 mds.b4 *** got signal
>> Terminated ***
>> 2016-05-27 15:05:57.080703 7fe0cccd8700  1 mds.b4 suicide.  wanted state
>> up:standby
>> 2016-05-27 15:06:04.527203 7f500f28a180  0 set uid:gid to 64045:64045
>> (ceph:ceph)
>> 2016-05-27 15:06:04.527259 7f500f28a180  0 ceph version 10.2.0
>> (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mds, pid 19163
>> 2016-05-27 15:06:04.527569 7f500f28a180  0 pidfile_write: ignore empty
>> --pid-file
>> 2016-05-27 15:06:04.637842 7f5008a04700  1 mds.b4 handle_mds_map standby
>>
>>
>>
>> The relevant output from 'ceph -s' looks like this:
>>
>>       fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged
>>
>>
>> What am I missing?
>>
>> Thanks,
>> Bryan
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com