On Fri, May 27, 2016 at 2:22 PM, Stillwell, Bryan J <Bryan.Stillwell@xxxxxxxxxxx> wrote: > Here's the full 'ceph -s' output: > > # ceph -s > cluster c7ba6111-e0d6-40e8-b0af-8428e8702df9 > health HEALTH_ERR > mds rank 0 is damaged > mds cluster is degraded > monmap e5: 3 mons at > {b3=172.24.88.53:6789/0,b4=172.24.88.54:6789/0,lira=172.24.88.20:6789/0} > election epoch 320, quorum 0,1,2 lira,b3,b4 > fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged > osdmap e35262: 21 osds: 21 up, 21 in > flags sortbitwise > pgmap v10096597: 480 pgs, 4 pools, 23718 GB data, 5951 kobjects > 35758 GB used, 11358 GB / 47116 GB avail > 479 active+clean > 1 active+clean+scrubbing+deep Yeah, you should just need to mark mds 0 as repaired at this point. > > > > > > > On 5/27/16, 3:17 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote: > >>What's the current full output of "ceph -s"? >> >>If you already had your MDS in damaged state, you might just need to >>mark it as repaired. That's a monitor command. >> >>On Fri, May 27, 2016 at 2:09 PM, Stillwell, Bryan J >><Bryan.Stillwell@xxxxxxxxxxx> wrote: >>> On 5/27/16, 3:01 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote: >>> >>>>> >>>>> So would the next steps be to run the following commands?: >>>>> >>>>> cephfs-table-tool 0 reset session >>>>> cephfs-table-tool 0 reset snap >>>>> cephfs-table-tool 0 reset inode >>>>> cephfs-journal-tool --rank=0 journal reset >>>>> cephfs-data-scan init >>>>> >>>>> cephfs-data-scan scan_extents data >>>>> cephfs-data-scan scan_inodes data >>>> >>>>No, definitely not. I think you just need to reset the journal again, >>>>since you wiped out a bunch of its data with that fs reset command. >>>>Since your backing data should already be consistent you don't need to >>>>do any data scans. Your snap and inode tables might be corrupt, >>>>but...hopefully not. If they are busted...actually, I don't remember; >>>>maybe you will need to run the data scan tooling to repair those. I'd >>>>try to avoid it if possible just because of the time involved. (It'll >>>>become obvious pretty quickly if the inode tables are no good.) >>> >>> So when I attempt to reset the journal again I get this: >>> >>> # cephfs-journal-tool journal reset >>> journal does not exist on-disk. Did you set a bad rank?2016-05-27 >>> 15:03:30.016326 7f63f987e700 0 client.20626476.journaler(ro) error >>> getting journal off disk >>> >>> Error loading journal: (2) No such file or directory, pass --force to >>> forcibly reset this journal >>> Error ((2) No such file or directory) >>> >>> >>> >>> And then I tried to force it which seemed to succeed: >>> >>> # cephfs-journal-tool journal reset --force >>> writing EResetJournal entry >>> >>> >>> >>> However, when I restart the mds it gets stuck in standby mode: >>> >>> 2016-05-27 15:05:57.080672 7fe0cccd8700 -1 mds.b4 *** got signal >>> Terminated *** >>> 2016-05-27 15:05:57.080703 7fe0cccd8700 1 mds.b4 suicide. wanted state >>> up:standby >>> 2016-05-27 15:06:04.527203 7f500f28a180 0 set uid:gid to 64045:64045 >>> (ceph:ceph) >>> 2016-05-27 15:06:04.527259 7f500f28a180 0 ceph version 10.2.0 >>> (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mds, pid 19163 >>> 2016-05-27 15:06:04.527569 7f500f28a180 0 pidfile_write: ignore empty >>> --pid-file >>> 2016-05-27 15:06:04.637842 7f5008a04700 1 mds.b4 handle_mds_map standby >>> >>> >>> >>> The relevant output from 'ceph -s' looks like this: >>> >>> fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged >>> >>> >>> What am I missing? >>> >>> Thanks, >>> Bryan >>> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com