Here's the full 'ceph -s' output: # ceph -s cluster c7ba6111-e0d6-40e8-b0af-8428e8702df9 health HEALTH_ERR mds rank 0 is damaged mds cluster is degraded monmap e5: 3 mons at {b3=172.24.88.53:6789/0,b4=172.24.88.54:6789/0,lira=172.24.88.20:6789/0} election epoch 320, quorum 0,1,2 lira,b3,b4 fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged osdmap e35262: 21 osds: 21 up, 21 in flags sortbitwise pgmap v10096597: 480 pgs, 4 pools, 23718 GB data, 5951 kobjects 35758 GB used, 11358 GB / 47116 GB avail 479 active+clean 1 active+clean+scrubbing+deep On 5/27/16, 3:17 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote: >What's the current full output of "ceph -s"? > >If you already had your MDS in damaged state, you might just need to >mark it as repaired. That's a monitor command. > >On Fri, May 27, 2016 at 2:09 PM, Stillwell, Bryan J ><Bryan.Stillwell@xxxxxxxxxxx> wrote: >> On 5/27/16, 3:01 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote: >> >>>> >>>> So would the next steps be to run the following commands?: >>>> >>>> cephfs-table-tool 0 reset session >>>> cephfs-table-tool 0 reset snap >>>> cephfs-table-tool 0 reset inode >>>> cephfs-journal-tool --rank=0 journal reset >>>> cephfs-data-scan init >>>> >>>> cephfs-data-scan scan_extents data >>>> cephfs-data-scan scan_inodes data >>> >>>No, definitely not. I think you just need to reset the journal again, >>>since you wiped out a bunch of its data with that fs reset command. >>>Since your backing data should already be consistent you don't need to >>>do any data scans. Your snap and inode tables might be corrupt, >>>but...hopefully not. If they are busted...actually, I don't remember; >>>maybe you will need to run the data scan tooling to repair those. I'd >>>try to avoid it if possible just because of the time involved. (It'll >>>become obvious pretty quickly if the inode tables are no good.) >> >> So when I attempt to reset the journal again I get this: >> >> # cephfs-journal-tool journal reset >> journal does not exist on-disk. Did you set a bad rank?2016-05-27 >> 15:03:30.016326 7f63f987e700 0 client.20626476.journaler(ro) error >> getting journal off disk >> >> Error loading journal: (2) No such file or directory, pass --force to >> forcibly reset this journal >> Error ((2) No such file or directory) >> >> >> >> And then I tried to force it which seemed to succeed: >> >> # cephfs-journal-tool journal reset --force >> writing EResetJournal entry >> >> >> >> However, when I restart the mds it gets stuck in standby mode: >> >> 2016-05-27 15:05:57.080672 7fe0cccd8700 -1 mds.b4 *** got signal >> Terminated *** >> 2016-05-27 15:05:57.080703 7fe0cccd8700 1 mds.b4 suicide. wanted state >> up:standby >> 2016-05-27 15:06:04.527203 7f500f28a180 0 set uid:gid to 64045:64045 >> (ceph:ceph) >> 2016-05-27 15:06:04.527259 7f500f28a180 0 ceph version 10.2.0 >> (3a9fba20ec743699b69bd0181dd6c54dc01c64b9), process ceph-mds, pid 19163 >> 2016-05-27 15:06:04.527569 7f500f28a180 0 pidfile_write: ignore empty >> --pid-file >> 2016-05-27 15:06:04.637842 7f5008a04700 1 mds.b4 handle_mds_map standby >> >> >> >> The relevant output from 'ceph -s' looks like this: >> >> fsmap e287: 0/1/1 up, 1 up:standby, 1 damaged >> >> >> What am I missing? >> >> Thanks, >> Bryan >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com