Power Outage

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Tue, 12 Aug 2014 12:58:19 -0700

I can't really help with MDS.  Hopefully somebody else will chime in here.

On Tue, Aug 12, 2014 at 12:44 PM, hjcho616 <hjcho616 at yahoo.com> wrote:

> Craig,
>
> Thanks.  It turns out one of my memory stick went bad after that power
> outage.  While trying to fix the OSDs I ran in to many kernel crashes.
>  After removing that bad memory, I was able to fix them.  I did remove all
> OSD on that machine and rebuilt it as I didn't trust that data anymore. =P
>
> I was hoping MDS would come up after that.  But it didn't.  It shows this
> and kills itself.  Is this related to 0.82 MDS issue?
> 2014-08-12 14:35:11.250634 7ff794bd57c0  0 ceph version 0.80.5
> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 10244
> 2014-08-12 14:35:11.251092 7ff794bd57c0  1 -- 192.168.1.20:0/0 learned my
> addr 192.168.1.20:0/0
> 2014-08-12 14:35:11.251118 7ff794bd57c0  1 accepter.accepter.bind
> my_inst.addr is 192.168.1.20:6800/10244 need_addr=0
> 2014-08-12 14:35:11.259207 7ff794bd57c0  1 -- 192.168.1.20:6800/10244
> messenger.start
> 2014-08-12 14:35:11.259576 7ff794bd57c0 10 mds.-1.0 168 MDSCacheObject
> 2014-08-12 14:35:11.259625 7ff794bd57c0 10 mds.-1.0 2304        CInode
> 2014-08-12 14:35:11.259630 7ff794bd57c0 10 mds.-1.0 16   elist<>::item
> *7=112
> 2014-08-12 14:35:11.259635 7ff794bd57c0 10 mds.-1.0 480  inode_t
> 2014-08-12 14:35:11.259639 7ff794bd57c0 10 mds.-1.0 56    nest_info_t
> 2014-08-12 14:35:11.259644 7ff794bd57c0 10 mds.-1.0 32    frag_info_t
> 2014-08-12 14:35:11.259648 7ff794bd57c0 10 mds.-1.0 40   SimpleLock
> *5=200
> 2014-08-12 14:35:11.259652 7ff794bd57c0 10 mds.-1.0 48   ScatterLock
>  *3=144
> 2014-08-12 14:35:11.259656 7ff794bd57c0 10 mds.-1.0 488 CDentry
> 2014-08-12 14:35:11.259661 7ff794bd57c0 10 mds.-1.0 16   elist<>::item
> 2014-08-12 14:35:11.259669 7ff794bd57c0 10 mds.-1.0 40   SimpleLock
> 2014-08-12 14:35:11.259674 7ff794bd57c0 10 mds.-1.0 1016        CDir
> 2014-08-12 14:35:11.259678 7ff794bd57c0 10 mds.-1.0 16   elist<>::item
> *2=32
> 2014-08-12 14:35:11.259682 7ff794bd57c0 10 mds.-1.0 192  fnode_t
> 2014-08-12 14:35:11.259687 7ff794bd57c0 10 mds.-1.0 56    nest_info_t *2
> 2014-08-12 14:35:11.259691 7ff794bd57c0 10 mds.-1.0 32    frag_info_t *2
> 2014-08-12 14:35:11.259695 7ff794bd57c0 10 mds.-1.0 176 Capability
> 2014-08-12 14:35:11.259699 7ff794bd57c0 10 mds.-1.0 32   xlist<>::item
> *2=64
> 2014-08-12 14:35:11.259767 7ff794bd57c0  1 accepter.accepter.start
> 2014-08-12 14:35:11.260734 7ff794bd57c0  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- auth(proto 0 31 bytes epoch 0) v1 -- ?+0 0x3684000
> con 0x36ac580
> 2014-08-12 14:35:11.261346 7ff794bcd700 10 mds.-1.0 MDS::ms_get_authorizer
> type=mon
> 2014-08-12 14:35:11.261696 7ff78fe4f700  5 mds.-1.0 ms_handle_connect on
> 192.168.1.20:6789/0
> 2014-08-12 14:35:11.262409 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 1 ==== mon_map v1 ==== 194+0+0 (4155369063 0 0)
> 0x36d4000 con 0x36ac580
> 2014-08-12 14:35:11.262572 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 33+0+0 (2093056952 0 0) 0x3691400 con 0x36ac580
> 2014-08-12 14:35:11.262925 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x3684240
> con 0x36ac580
> 2014-08-12 14:35:11.263643 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 206+0+0 (1371651101 0 0) 0x3691800 con 0x36ac580
> 2014-08-12 14:35:11.263807 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x36846c0 con 0x36ac580
> 2014-08-12 14:35:11.264518 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 580+0+0 (1904484134 0 0) 0x3691600 con 0x36ac580
> 2014-08-12 14:35:11.264662 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x36b4380 con
> 0x36ac580
> 2014-08-12 14:35:11.264744 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x3684480
> con 0x36ac580
> 2014-08-12 14:35:11.265027 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 5 ==== mon_map v1 ==== 194+0+0 (4155369063 0 0)
> 0x36d43c0 con 0x36ac580
> 2014-08-12 14:35:11.265203 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
> (2253672535 0 0) 0x36b4540 con 0x36ac580
> 2014-08-12 14:35:11.265251 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 194+0+0 (1999696020 0 0) 0x3691a00 con 0x36ac580
> 2014-08-12 14:35:11.265506 7ff794bd57c0  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
> 0x36b41c0 con 0x36ac580
> 2014-08-12 14:35:11.265580 7ff794bd57c0  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- mon_subscribe({mdsmap=0+,monmap=2+,osdmap=0}) v2
> -- ?+0 0x36b4a80 con 0x36ac580
> 2014-08-12 14:35:11.266159 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 8 ==== osd_map(9687..9687 src has 9090..9687)
> v3 ==== 6983+0+0 (1578463925 0 0) 0x3684b40 con 0x36ac580
> 2014-08-12 14:35:11.266453 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
> (2253672535 0 0) 0x36b41c0 con 0x36ac580
> 2014-08-12 14:35:11.266491 7ff78fe4f700  1 -- 192.168.1.20:6800/10244 <==
> mon.0 192.168.1.20:6789/0 10 ==== mdsmap(e 7182) v1 ==== 653+0+0
> (374906493 0 0) 0x3691800 con 0x36ac580
> 2014-08-12 14:35:11.266518 7ff794bd57c0 10 mds.-1.0 beacon_send up:boot
> seq 1 (currently up:boot)
> 2014-08-12 14:35:11.266585 7ff794bd57c0  1 -- 192.168.1.20:6800/10244 -->
> 192.168.1.20:6789/0 -- mdsbeacon(12799/MDS1.1 up:boot seq 1 v0) v2 -- ?+0
> 0x36bc2c0 con 0x36ac580
> 2014-08-12 14:35:11.266626 7ff794bd57c0 10 mds.-1.0 create_logger
> 2014-08-12 14:35:11.266677 7ff78fe4f700  5 mds.-1.0 handle_mds_map epoch
> 7182 from mon.0
> 2014-08-12 14:35:11.266779 7ff78fe4f700 10 mds.-1.0      my compat
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data}
> 2014-08-12 14:35:11.266793 7ff78fe4f700 10 mds.-1.0  mdsmap compat
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
> 2014-08-12 14:35:11.266803 7ff78fe4f700  0 mds.-1.0 handle_mds_map mdsmap
> compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} not
> writeable with daemon features compat={},rocompat={},incompat={1=base
> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode
> in separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,7=mds uses inline data}, killing myself
> 2014-08-12 14:35:11.266821 7ff78fe4f700  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-08-12 14:35:11.267081 7ff78fe4f700  1 -- 192.168.1.20:6800/10244
> mark_down 0x36ac580 -- 0x36c8500
> 2014-08-12 14:35:11.267204 7ff78fe4f700  1 -- 192.168.1.20:6800/10244
> mark_down_all
> 2014-08-12 14:35:11.267612 7ff794bd57c0  1 -- 192.168.1.20:6800/10244
> shutdown complete.
>
> Regards,
> Hong
>
>
>
>   On Tuesday, July 22, 2014 4:03 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
>  The osd lost is useful, but not strictly required.  It accelerates the
> recovery once things are stable.  It tells Ceph to give up trying to
> recovery data off those disks.  Without it, Ceph will still check, then
> give up when it can't find it.
>
>
> I was having problems with the suicide timeout at one point.  Basically,
> the OSDs fail and restart so many times that they can't apply all of the
> map changes before they hit the timeout.  Sage gave me some suggestions.
>  Give this a try:
> https://www.mail-archive.com/ceph-devel at vger.kernel.org/msg18862.html
>
> That process solved suicide timeouts, with one caveat.  When I followed
> it, I filled up /var/log/ceph/ and the recovery failed.  I had to manually
> run each OSD in debugging mode until it completed the map update.  Aside
> from that, I followed your procedure.
>
> I had to run that procedure on all OSDs.  I did all OSDs on a node at the
> same time.
>
>
>
>
>
>
> On Mon, Jul 21, 2014 at 11:45 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> Craig,
>
> osd.2 was down and out.  lost wasn't working.. so skipped it. =P
>  Formatted the drive XFS and got mostly working but couldn't figure out how
> to get the journal to point at my SSD, and init script wasn't able to find
> the osd.2 for some reason.  So just used ceph-deploy.  It created new osd.6
> on the disks that were used for osd.2.  I removed norecover and nobackfill
> and let the system rebuild.  It seemed like it was doing well until it hit
> that suicide timeout. What should I do in this case?
>
>    -20> 2014-07-22 01:01:26.087707 7f3a90012700 10 monclient:
> _check_auth_rotating have uptodate secrets (they expire after 2014-07-22
> 01:00:56.087703)
>    -19> 2014-07-22 01:01:26.087743 7f3a90012700 10 monclient: renew subs?
> (now: 2014-07-22 01:01:26.087742; renew after: 2014-07-22 01:01:16.084357)
> -- yes
>    -18> 2014-07-22 01:01:26.087775 7f3a90012700 10 monclient: renew_subs
>    -17> 2014-07-22 01:01:26.087793 7f3a90012700 10 monclient:
> _send_mon_message to mon.MDS1 at 192.168.1.20:6789/0
>    -16> 2014-07-22 01:01:26.087822 7f3a90012700  1 --
> 192.168.1.30:6800/6297 --> 192.168.1.20:6789/0 --
> mon_subscribe({monmap=2+,osd_pg_creates=0}) v2 -- ?+0 0x1442c000 con
> 0xf73e2c0
>    -15> 2014-07-22 01:01:27.916972 7f3a8c80b700  5 osd.6 3252 heartbeat:
> osd_stat(66173 MB used, 1797 GB avail, 1862 GB total, peers [3,4,5]/[] op
> hist [])
>    -14> 2014-07-22 01:01:27.917061 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.2.31:6803/13623 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0x140201c0 con 0xfa68160
>    -13> 2014-07-22 01:01:27.917131 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.1.31:6804/13623 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0x17d0b500 con 0xfa68000
>    -12> 2014-07-22 01:01:27.917180 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.2.31:6805/13991 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0xfa8fdc0 con 0x19208c60
>    -11> 2014-07-22 01:01:27.917229 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.1.31:6807/13991 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0xffcee00 con 0x205c000
>    -10> 2014-07-22 01:01:27.917276 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.2.31:6801/13249 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0x224fdc0 con 0xf9f8dc0
>     -9> 2014-07-22 01:01:27.917325 7f3a8c80b700  1 -- 192.168.2.30:0/6297
> --> 192.168.1.31:6801/13249 -- osd_ping(ping e3252 stamp 2014-07-22
> 01:01:27.917024) v2 -- ?+0 0xf8ce000 con 0x19208840
>     -8> 2014-07-22 01:01:27.918723 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.3 192.168.1.31:6804/13623 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0xffcf500 con
> 0xfa68000
>     -7> 2014-07-22 01:01:27.918830 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.5 192.168.1.31:6801/13249 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x11a5e700 con
> 0x19208840
>     -6> 2014-07-22 01:01:27.919218 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.5 192.168.2.31:6801/13249 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0xfa8fa40 con
> 0xf9f8dc0
>     -5> 2014-07-22 01:01:27.919396 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.3 192.168.2.31:6803/13623 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x1dd8bc00 con
> 0xfa68160
>     -4> 2014-07-22 01:01:27.919521 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.4 192.168.2.31:6805/13991 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x14021a40 con
> 0x19208c60
>     -3> 2014-07-22 01:01:27.919606 7f3a9581d700  1 -- 192.168.2.30:0/6297
> <== osd.4 192.168.1.31:6807/13991 28 ==== osd_ping(ping_reply e3252 stamp
> 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x10fdd6c0 con
> 0x205c000
>     -2> 2014-07-22 01:01:29.976382 7f3aa5c22700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7f3a9d0b0700' had timed out after 60
>     -1> 2014-07-22 01:01:29.976416 7f3aa5c22700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7f3a9d0b0700' had suicide timed out
> after 180
>      0> 2014-07-22 01:01:29.985984 7f3aa5c22700 -1 common/HeartbeatMap.cc:
> In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7f3aa5c22700 time 2014-07-22 01:01:29.976450
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  4: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  5: (()+0x8062) [0x7f3aa8f89062]
>  6: (clone()+0x6d) [0x7f3aa78c9a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.6.log
> --- end dump of recent events ---
> 2014-07-22 01:01:30.352843 7f3aa5c22700 -1 *** Caught signal (Aborted) **
>  in thread 7f3aa5c22700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7f3aa8f90880]
>  3: (gsignal()+0x39) [0x7f3aa78193a9]
>  4: (abort()+0x148) [0x7f3aa781c4c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3aa81065e5]
>  6: (()+0x5e746) [0x7f3aa8104746]
>  7: (()+0x5e773) [0x7f3aa8104773]
>  8: (()+0x5e9b2) [0x7f3aa81049b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  13: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  14: (()+0x8062) [0x7f3aa8f89062]
>  15: (clone()+0x6d) [0x7f3aa78c9a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>      0> 2014-07-22 01:01:30.352843 7f3aa5c22700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f3aa5c22700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7f3aa8f90880]
>  3: (gsignal()+0x39) [0x7f3aa78193a9]
>  4: (abort()+0x148) [0x7f3aa781c4c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3aa81065e5]
>  6: (()+0x5e746) [0x7f3aa8104746]
>  7: (()+0x5e773) [0x7f3aa8104773]
>  8: (()+0x5e9b2) [0x7f3aa81049b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  13: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  14: (()+0x8062) [0x7f3aa8f89062]
>  15: (clone()+0x6d) [0x7f3aa78c9a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.6.log
> --- end dump of recent events ---
>
> Regards,
> Hong
>
>
>
>   On Monday, July 21, 2014 9:35 PM, Craig Lewis <clewis at centraldesktop.com>
> wrote:
>
>
>  I'd like to get rid of those inconsistent PGs.  I think fixing those
> will get your MDS working again, but I don't actually know anything about
> MDS.  Still, it's best to work your way up from the bottom.  If the OSDs
> aren't stable, there's no use building services on top of them.
>
>
> It's strange that osd.0 was up, but crashed during deep-scrubbing.  You
> might try disabling deep-scrubs (ceph osd set nodeep-scrub), and see if
> osd.0 will stay up.  If running without deep-scrubbing will get your
> cluster consistent, you can reformat the disk later.
>
> You said osd.2 fails to start, with a corrupt journal error.  There's not
> much you can do there.  You should remove it again, mark it lost, reformat
> the disk, and re-add it to the cluster.
>
>
> I'd rebuild osd.2 first, while leaving osd.0 and osd.1 down.
>
> Do you have enough disk space that osd.2 can take all of the data from
> osd.0 and osd.1?  If so, you can mark osd.0 and osd.1 as DOWN and OUT.  If
> not, make sure that osd.0 and osd.1 are marked DOWN and IN.
>
> Once osd.2 finishes rebuilding, I'd set noin, then bring osd.0 and osd.1
> up.  If they're OUT, that will allow Ceph to copy any unique data they
> might have, but it won't try to write anything to them.  If they're IN,
> well, Ceph will try to write to them.  Either way, I'm hoping that they
> stay up long enough for you to get 100% consistent.
>
>
>
>
>
>
> On Sun, Jul 20, 2014 at 7:01 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
>  Based on your suggestion here is what I did.
>
> # ceph osd set nobackfill
> set nobackfill
> # ceph osd set norecovery
> Invalid command:  norecovery not in
> pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent
> osd set
> pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent
> :  set <key>
> Error EINVAL: invalid command
> # ceph osd set norecover
> set norecover
> # ceph osd set noin
> set noin
> # ceph create osd
> no valid command found; 10 closest matches:
> osd tier remove <poolname> <poolname>
> osd tier cache-mode <poolname> none|writeback|forward|readonly
> osd thrash <int[0-]>
> osd tier add <poolname> <poolname> {--force-nonempty}
> osd pool stats {<name>}
> osd reweight-by-utilization {<int[100-]>}
> osd pool set <poolname>
> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid
> <val> {--yes-i-really-mean-it}
> osd pool set-quota <poolname> max_objects|max_bytes <val>
> osd pool rename <poolname> <poolname>
> osd pool get <poolname>
> size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid
> Error EINVAL: invalid command
> # ceph osd create
> 0
> # ceph osd create
> 1
> # ceph osd create
> 2
> # start ceph-osd id=0
> bash: start: command not found
> # /etc/init.d/ceph start osd.0
> === osd.0 ===
> 2014-07-18 21:21:37.207159 7ff2c64d7700  0 librados: osd.0 authentication
> error (1) Operation not permitted
> Error connecting to cluster: PermissionError
> failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0
> --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0
> 1.82 host=OSD1 root=default'
> # ceph status
>     cluster 9b2c9bca-112e-48b0-86fc-587ef9a52948
>      health HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs
> stuck unclean; recovery 1484224/3513098 objects degraded (42.248%); 1374
> scrub errors; mds cluster is degraded; mds MDS1 is laggy;
> noin,nobackfill,norecover flag(s) set
>      monmap e1: 1 mons at {MDS1=192.168.1.20:6789/0}, election epoch 1,
> quorum 0 MDS1
>      mdsmap e7182: 1/1/1 up {0=MDS1=up:replay(laggy or crashed)}
>      osdmap e3133: 6 osds: 3 up, 3 in
>             flags noin,nobackfill,norecover
>       pgmap v309437: 192 pgs, 3 pools, 1571 GB data, 1715 kobjects
>             1958 GB used, 3627 GB / 5586 GB avail
>             1484224/3513098 objects degraded (42.248%)
>                  131 active+degraded
>                   23 active+remapped
>                   33 active+degraded+inconsistent
>                    5 active+remapped+inconsistent
> # ceph osd stat
>      osdmap e3133: 6 osds: 3 up, 3 in
>             flags noin,nobackfill,norecover
> # ceph auth get-or-create osd.0 mon 'allow rwx' osd 'allow *' -o
> /var/lib/ceph/osd/ceph-0/keyring
>
> # /etc/init.d/ceph start osd.0
> === osd.0 ===
> create-or-move updating item name 'osd.0' weight 1.82 at location
> {host=OSD1,root=default} to crush map
> Starting Ceph osd.0 on OSD1...
> starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0
> /var/lib/ceph/osd/ceph-0/journal
> root at OSD1:/home/genie# ceph auth get-or-create osd.1 mon 'allow rwx' osd
> 'allow *' -o /var/lib/ceph/osd/ceph-1/keyring
> root at OSD1:/home/genie# ceph auth get-or-create osd.2 mon 'allow rwx' osd
> 'allow *' -o /var/lib/ceph/osd/ceph-2/keyring
> root at OSD1:/home/genie# /etc/init.d/ceph start osd.1
> === osd.1 ===
> failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1
> --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1
> 1.82 host=OSD1 root=default'
> # /etc/init.d/ceph start osd.2
> === osd.2 ===
> create-or-move updating item name 'osd.2' weight 1.82 at location
> {host=OSD1,root=default} to crush map
> Starting Ceph osd.2 on OSD1...
> starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2
> /var/lib/ceph/osd/ceph-2/journal
> # /etc/init.d/ceph start osd.1
> === osd.1 ===
> failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1
> --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1
> 1.82 host=OSD1 root=default'
> # ceph health
> Segmentation fault
> # ceph health
> Bus error
> # ceph health
> HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs stuck unclean;
> recovery 1484224/3513098 objects degraded (42.248%); 1374 scrub errors; mds
> cluster is degraded; mds MDS1 is laggy; noin,nobackfill,norecover flag(s)
> set
> # /etc/init.d/ceph start osd.1
> === osd.1 ===
> create-or-move updating item name 'osd.1' weight 1.82 at location
> {host=OSD1,root=default} to crush map
> Starting Ceph osd.1 on OSD1...
> starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1
> /var/lib/ceph/osd/ceph-1/journal
> # ceph -w
>     cluster 9b2c9bca-112e-48b0-86fc-587ef9a52948
>      health HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs
> stuck unclean; recovery 1484224/3513098 objects degraded (42.248%); 1374
> scrub errors; mds cluster is degraded; mds MDS1 is laggy;
> noin,nobackfill,norecover flag(s) set
>      monmap e1: 1 mons at {MDS1=192.168.1.20:6789/0}, election epoch 1,
> quorum 0 MDS1
>      mdsmap e7182: 1/1/1 up {0=MDS1=up:replay(laggy or crashed)}
>      osdmap e3137: 6 osds: 4 up, 3 in
>             flags noin,nobackfill,norecover
>       pgmap v309463: 192 pgs, 3 pools, 1571 GB data, 1715 kobjects
>             1958 GB used, 3627 GB / 5586 GB avail
>             1484224/3513098 objects degraded (42.248%)
>                  131 active+degraded
>                   23 active+remapped
>                   33 active+degraded+inconsistent
>                    5 active+remapped+inconsistent
>
> 2014-07-19 21:34:59.166709 mon.0 [INF] pgmap v309463: 192 pgs: 131
> active+degraded, 23 active+remapped, 33 active+degraded+inconsistent, 5
> active+remapped+inconsistent; 1571 GB data, 1958 GB used, 3627 GB / 5586 GB
> avail; 1484224/3513098 objects degraded (42.248%)
>
>
> osd.2 doesn't come up.  osd.1 uses little memory compared to osd.0, but it
> stays alive.  Killed osd.1 and osd.2 for now.  At this point osd.0's CPU
> was on and off for a while.  But it didn't kill it.  So I did ceph osd
> unset noin and restarted osd.0.  It seemed to be doing something for a long
> time.  I let it run over night.  Found it crashed today.  Below is the log
> of it.
>
>
>    -20> 2014-07-20 00:54:10.924602 7fb562528700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.924244, event: header_read, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -19> 2014-07-20 00:54:10.924652 7fb562528700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.924250, event: throttled, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -18> 2014-07-20 00:54:10.924698 7fb562528700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.924458, event: all_read, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -17> 2014-07-20 00:54:10.924743 7fb562528700  5 -- op tracker -- , seq:
> 4847, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 1.29
> 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[])
>    -16> 2014-07-20 00:54:10.924880 7fb54d78f700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.924861, event: reached_pg, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -15> 2014-07-20 00:54:10.924936 7fb54d78f700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.924915, event: started, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -14> 2014-07-20 00:54:10.924974 7fb54d78f700  1 --
> 192.168.2.30:6800/18511 --> 192.168.2.31:6804/13991 --
> osd_sub_op_reply(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] ack, result =
> 0) v2 -- ?+1 0x10503680 con 0xfdca000
>    -13> 2014-07-20 00:54:10.925053 7fb54d78f700  5 -- op tracker -- , seq:
> 4847, time: 2014-07-20 00:54:10.925034, event: done, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -12> 2014-07-20 00:54:10.926801 7fb562528700  1 --
> 192.168.2.30:6800/18511 <== osd.4 192.168.2.31:6804/13991 1742 ====
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[]) v10 ==== 1145+0+0 (2357365982 0 0) 0x1045a100
> con 0xfdca000
>    -11> 2014-07-20 00:54:10.926912 7fb562528700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.926624, event: header_read, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>    -10> 2014-07-20 00:54:10.926961 7fb562528700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.926628, event: throttled, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>     -9> 2014-07-20 00:54:10.927004 7fb562528700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.926786, event: all_read, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>     -8> 2014-07-20 00:54:10.927046 7fb562528700  5 -- op tracker -- , seq:
> 4848, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 1.29
> 0//0//-1 [scrub-unreserve] v 0'0 snapset=0=[]:[] snapc=0=[])
>     -7> 2014-07-20 00:54:10.927179 7fb54df90700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.927160, event: reached_pg, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>     -6> 2014-07-20 00:54:10.927237 7fb54df90700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.927216, event: started, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>     -5> 2014-07-20 00:54:10.927289 7fb54df90700  5 -- op tracker -- , seq:
> 4848, time: 2014-07-20 00:54:10.927269, event: done, op:
> osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0
> snapset=0=[]:[] snapc=0=[])
>     -4> 2014-07-20 00:54:10.941372 7fb551f98700  1 --
> 192.168.2.30:6801/18511 <== osd.3 192.168.1.31:0/13623 776 ====
> osd_ping(ping e3144 stamp 2014-07-20 00:54:10.942416) v2 ==== 47+0+0
> (216963345 0 0) 0x103c0e00 con 0x1001bce0
>     -3> 2014-07-20 00:54:10.941451 7fb551f98700  1 --
> 192.168.2.30:6801/18511 --> 192.168.1.31:0/13623 -- osd_ping(ping_reply
> e3144 stamp 2014-07-20 00:54:10.942416) v2 -- ?+0 0x100e2540 con 0x1001bce0
>     -2> 2014-07-20 00:54:10.941742 7fb55379b700  1 --
> 192.168.1.30:6801/18511 <== osd.3 192.168.1.31:0/13623 776 ====
> osd_ping(ping e3144 stamp 2014-07-20 00:54:10.942416) v2 ==== 47+0+0
> (216963345 0 0) 0x10547880 con 0xff8db80
>     -1> 2014-07-20 00:54:10.941842 7fb55379b700  1 --
> 192.168.1.30:6801/18511 --> 192.168.1.31:0/13623 -- osd_ping(ping_reply
> e3144 stamp 2014-07-20 00:54:10.942416) v2 -- ?+0 0x10254a80 con 0xff8db80
>      0> 2014-07-20 00:54:11.646226 7fb54c78d700 -1 os/DBObjectMap.cc: In
> function 'virtual bool DBObjectMap::DBObjectMapIteratorImpl::valid()'
> thread 7fb54c78d700 time 2014-07-20 00:54:11.640719
> os/DBObjectMap.cc: 399: FAILED assert(!valid || cur_iter->valid())
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xa72172]
>  2: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap::object&,
> ThreadPool::TPHandle&)+0x6c3) [0xa2df03]
>  3: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503)
> [0x98c523]
>  4: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x10b) [0x891d4b]
>  5: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456)
> [0x8925d6]
>  6: (OSD::RepScrubWQ::_process(MOSDRepScrub*,
> ThreadPool::TPHandle&)+0x10a) [0x7b00fa]
>  7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a]
>  8: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80]
>  9: (()+0x8062) [0x7fb56b184062]
>  10: (clone()+0x6d) [0x7fb569ac4a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
> 2014-07-20 00:54:11.998700 7fb54c78d700 -1 *** Caught signal (Aborted) **
>  in thread 7fb54c78d700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fb56b18b880]
>  3: (gsignal()+0x39) [0x7fb569a143a9]
>  4: (abort()+0x148) [0x7fb569a174c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb56a3015e5]
>  6: (()+0x5e746) [0x7fb56a2ff746]
>  7: (()+0x5e773) [0x7fb56a2ff773]
>  8: (()+0x5e9b2) [0x7fb56a2ff9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: /usr/bin/ceph-osd() [0xa72172]
>
>  11: (ReplicatedBackend::be_deep_scrub(hobject_t const&,
> ScrubMap::object&, ThreadPool::TPHandle&)+0x6c3) [0xa2df03]
>  12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503)
> [0x98c523]
>  13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x10b) [0x891d4b]
>  14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456)
> [0x8925d6]
>  15: (OSD::RepScrubWQ::_process(MOSDRepScrub*,
> ThreadPool::TPHandle&)+0x10a) [0x7b00fa]
>  16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a]
>  17: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80]
>  18: (()+0x8062) [0x7fb56b184062]
>  19: (clone()+0x6d) [0x7fb569ac4a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>     -1> 2014-07-20 00:54:11.755763 7fb565618700  5 osd.0 3144 tick
>      0> 2014-07-20 00:54:11.998700 7fb54c78d700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fb54c78d700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fb56b18b880]
>  3: (gsignal()+0x39) [0x7fb569a143a9]
>  4: (abort()+0x148) [0x7fb569a174c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb56a3015e5]
>  6: (()+0x5e746) [0x7fb56a2ff746]
>  7: (()+0x5e773) [0x7fb56a2ff773]
>  8: (()+0x5e9b2) [0x7fb56a2ff9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: /usr/bin/ceph-osd() [0xa72172]
>  11: (ReplicatedBackend::be_deep_scrub(hobject_t const&,
> ScrubMap::object&, ThreadPool::TPHandle&)+0x6c3) [0xa2df03]
>  12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503)
> [0x98c523]
>  13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x10b) [0x891d4b]
>  14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456)
> [0x8925d6]
>  15: (OSD::RepScrubWQ::_process(MOSDRepScrub*,
> ThreadPool::TPHandle&)+0x10a) [0x7b00fa]
>  16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a]
>  17: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80]
>  18: (()+0x8062) [0x7fb56b184062]
>  19: (clone()+0x6d) [0x7fb569ac4a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
>
>
> What can I do about this one?
>
> Regards,
> Hong
>
>
>
>
>     On Friday, July 18, 2014 5:16 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
>  That I can't help you with.  I'm a pure RadosGW user.  But OSD stability
> affects everybody. :-P
>
>
> On Fri, Jul 18, 2014 at 2:34 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> Thanks Craig.  I will try this soon.  BTW should I upgrade to 0.80.4
> first?  The MDS journal issue seems to be one of the issue I am running
> into.
>
> Regards,
> Hong
>
>
>   On Friday, July 18, 2014 4:14 PM, Craig Lewis <clewis at centraldesktop.com>
> wrote:
>
>
>  If osd.3, osd.4, and osd.5 are stable, your cluster should be working
> again.  What does ceph status say?
>
>
> I was able to re-add removed osd.
> Here's what I did on my dev cluster:
> stop ceph-osd id=0
> ceph osd down 0
> ceph osd out 0
> ceph osd rm 0
> ceph osd crush rm osd.0
>
> Now my osd tree and osd dump do not show osd.0.  The cluster was degraded,
> but did not do any backfilling because I require 3x replication on 3
> different hosts, and Ceph can't satisfy that with 2 osds.
>
> On the same host, I ran:
> ceph osd create        # Returned ID 0
> start ceph-osd id=0
>
>
> osd.0 started up and joined the cluster.  Once peering completed, all of
> the PGs recovered quickly.  I didn't have any writes on the cluster while I
> was doing this.
>
> So it looks like you can just re-create and start those deleted osds.
>
>
>
> In your situation, I would do the following.  Before you start, go through
> this, and make sure you understand all the steps.  Worst case, you can
> always undo this by removing the osds again, and you'll be back to where
> you are now.
>
> ceph osd set nobackfill
> ceph osd set norecovery
> ceph osd set noin
> ceph create osd   # Should return 0.  Abort if it doesn't.
> ceph create osd   # Should return 1.  Abort if it doesn't.
> ceph create osd   # Should return 2.  Abort if it doesn't.
> start ceph-osd id=0
>
> Watch ceph -w and top.  Hopefully ceph-osd id=0 will use some CPU, then
> go UP, and drop to 0% cpu.  If so,
> ceph osd unset noin
> restart ceph-osd id=0
>
> Now osd.0 should go UP and IN, use some CPU for a while, then drop to 0%
> cpu.  If osd.0 drops out now,  set noout, and shut it down.
>
> set noin again, and start osd.1.  When it's stable, do it again for osd.2.
>
> Once as many as possible are up and stable:
> ceph osd unset nobackfill
> ceph osd unset norecovery
>
> Now it should start recovering.  If your osds start dropping out now,  set
> noout, and shut down the ones that are having problems.
>
>
> The goal is to get all the stable osds up, in, and recovered.  Once that's
> done, we can figure out what to do with the unstable osds.
>
>
>
>
>
>
>
>
>
> On Thu, Jul 17, 2014 at 9:29 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> Sorry Craig.  I thought I sent both but second part didn't copy right.
>  For some reason over night MDS and MON decided to stop so I started it
> when I was running those commands.  Interestingly MDS didn't fail at the
> time like it used to.  So I thought something was being fixed?  Then I now
> realize MDS probably couldn't get to the data because OSD were down.  Now
> that I brought up the OSDs MDS crashed again. =P
>
> $ ceph osd tree
> # id    weight  type name       up/down reweight
> -1      5.46    root default
> -2      0               host OSD1
> -3      5.46            host OSD2
> 3       1.82                    osd.3   up      1
> 4       1.82                    osd.4   up      1
> 5       1.82                    osd.5   up      1
>
> $ ceph osd dump
> epoch 3125
> fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948
> created 2014-02-08 01:57:34.086532
> modified 2014-07-17 23:24:10.823596
> flags
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45
> stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> max_osd 6
> osd.3 up   in  weight 1 up_from 3120 up_thru 3122 down_at 3116
> last_clean_interval [2858,3113) 192.168.1.31:6803/13623
> 192.168.2.31:6802/13623 192.168.2.31:6803/13623 192.168.1.31:6804/13623
> exists,up 4f86a418-6c67-4cb4-83a1-6c123c890036
> osd.4 up   in  weight 1 up_from 3121 up_thru 3122 down_at 3116
> last_clean_interval [2859,3113) 192.168.1.31:6806/13991
> 192.168.2.31:6804/13991 192.168.2.31:6805/13991 192.168.1.31:6807/13991
> exists,up 3d5e3843-7a47-44b0-b276-61c4b1d62900
> osd.5 up   in  weight 1 up_from 3118 up_thru 3118 down_at 3116
> last_clean_interval [2856,3113) 192.168.1.31:6800/13249
> 192.168.2.31:6800/13249 192.168.2.31:6801/13249 192.168.1.31:6801/13249
> exists,up eec86483-2f35-48a4-a154-2eaf26be06b9
> pg_temp 0.2 [4,3]
> pg_temp 0.a [4,5]
> pg_temp 0.c [3,4]
> pg_temp 0.10 [3,4]
> pg_temp 0.15 [3,5]
> pg_temp 0.17 [3,5]
> pg_temp 0.2f [4,5]
> pg_temp 0.3b [4,3]
> pg_temp 0.3c [3,5]
> pg_temp 0.3d [4,5]
> pg_temp 1.1 [4,3]
> pg_temp 1.9 [4,5]
> pg_temp 1.b [3,4]
> pg_temp 1.14 [3,5]
> pg_temp 1.16 [3,5]
> pg_temp 1.2e [4,5]
> pg_temp 1.3a [4,3]
> pg_temp 1.3b [3,5]
> pg_temp 1.3c [4,5]
> pg_temp 2.0 [4,3]
> pg_temp 2.8 [4,5]
> pg_temp 2.a [3,4]
> pg_temp 2.13 [3,5]
> pg_temp 2.15 [3,5]
> pg_temp 2.2d [4,5]
> pg_temp 2.39 [4,3]
> pg_temp 2.3a [3,5]
> pg_temp 2.3b [4,5]
> blacklist 192.168.1.20:6802/30894 expires 2014-07-17 23:48:10.823576
> blacklist 192.168.1.20:6801/30651 expires 2014-07-17 23:47:55.562984
>
> Regards,
> Hong
>
>
>   On Thursday, July 17, 2014 3:30 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
> You gave me 'ceph osd dump' twice.... can I see 'ceph osd tree' too?
>
> Why are osd.3, osd.4, and osd.5 down?
>
>
> On Thu, Jul 17, 2014 at 11:45 AM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> Thank you for looking at this.  Below are the outputs you requested.
>
> # ceph osd dump
> epoch 3117
> fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948
> created 2014-02-08 01:57:34.086532
> modified 2014-07-16 22:13:04.385914
> flags
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45
> stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> max_osd 6
> osd.3 down in  weight 1 up_from 2858 up_thru 3040 down_at 3116
> last_clean_interval [2830,2851) 192.168.1.31:6803/5127
> 192.168.2.31:6805/5127 192.168.2.31:6806/5127 192.168.1.31:6805/5127
> exists 4f86a418-6c67-4cb4-83a1-6c123c890036
> osd.4 down in  weight 1 up_from 2859 up_thru 3043 down_at 3116
> last_clean_interval [2835,2849) 192.168.1.31:6807/5310
> 192.168.2.31:6807/5310 192.168.2.31:6808/5310 192.168.1.31:6808/5310
> exists 3d5e3843-7a47-44b0-b276-61c4b1d62900
> osd.5 down in  weight 1 up_from 2856 up_thru 3042 down_at 3116
> last_clean_interval [2837,2853) 192.168.1.31:6800/4969
> 192.168.2.31:6801/4969 192.168.2.31:6804/4969 192.168.1.31:6801/4969
> exists eec86483-2f35-48a4-a154-2eaf26be06b9
>
> # ceph osd dump
> epoch 3117
> fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948
> created 2014-02-08 01:57:34.086532
> modified 2014-07-16 22:13:04.385914
> flags
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45
> stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0
> max_osd 6
> osd.3 down in  weight 1 up_from 2858 up_thru 3040 down_at 3116
> last_clean_interval [2830,2851) 192.168.1.31:6803/5127
> 192.168.2.31:6805/5127 192.168.2.31:6806/5127 192.168.1.31:6805/5127
> exists 4f86a418-6c67-4cb4-83a1-6c123c890036
> osd.4 down in  weight 1 up_from 2859 up_thru 3043 down_at 3116
> last_clean_interval [2835,2849) 192.168.1.31:6807/5310
> 192.168.2.31:6807/5310 192.168.2.31:6808/5310 192.168.1.31:6808/5310
> exists 3d5e3843-7a47-44b0-b276-61c4b1d62900
> osd.5 down in  weight 1 up_from 2856 up_thru 3042 down_at 3116
> last_clean_interval [2837,2853) 192.168.1.31:6800/4969
> 192.168.2.31:6801/4969 192.168.2.31:6804/4969 192.168.1.31:6801/4969
> exists eec86483-2f35-48a4-a154-2eaf26be06b9
>
> Regards,
> Hong
>
>
>
>   On Thursday, July 17, 2014 12:02 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
> I don't believe you can re-add an OSD after `ceph osd rm`, but it's worth
> a shot.  Let me see what I can do on my dev cluster.
>
> What does `ceph osd dump` and `ceph osd tree` say?  I want to make sure
> I'm starting from the same point you are.
>
>
>
> On Wed, Jul 16, 2014 at 7:39 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> I did a "ceph osd rm" for all three but I didn't do anything else to it
> afterwards.  Can this be added back?
>
> Regards,
> Hong
>
>
>   On Wednesday, July 16, 2014 6:54 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
>  For some reason you ended up in my spam folder.  That might be why you
> didn't get any responses.
>
>
> Have you destroyed osd.0, osd.1, and osd.2?  If not, try bringing them up
> one a time.  You might have just one bad disk, which is much better than
> 50% of your disks.
>
> How is the ceph-osd process behaving when it hits the suicide timeout?  I
> had some problems a while back where the ceph-osd process would startup,
> start consuming ~200% CPU for a while, then get stuck using almost exactly
> 100% CPU.  It would get kicked out of the cluster for being unresponsive,
> then suicide.  Repeat.  If that's happening here, I can suggest some things
> to try.
>
>
>
>
>
> On Fri, Jul 11, 2014 at 9:12 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> I have 2 OSD machines with 3 OSD running on each.  One MDS server with 3
> daemons running.  Ran cephfs mostly on 0.78.  One night we lost power for
> split second.  MDS1 and OSD2 went down, OSD1 seemed OK, well turns out OSD1
> suffered most.  Those two machines rebooted and seemed ok except it had
> some inconsistencies.  I waited for a while, didn't fix itself.  So I
> issued 'ceph pg repair pgnum'.  It would try some and some OSD would crash.
>  Tried this for multiple days.  Got some PGs fixed... but mostly it would
> crash an OSD and stop recovering.  dmesg shows something like below.
>
>
>
> [  740.059498] traps: ceph-osd[5279] general protection ip:7f84e75ec75e
> sp:7fff00045bc0 error:0 in libtcmalloc.so.4.1.0[7f84e75b3000+4a000]
>
> and ceph osd log shows something like this.
>
>      -2> 2014-07-09 20:51:01.163571 7fe0f4617700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had timed out after 60
>     -1> 2014-07-09 20:51:01.163609 7fe0f4617700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had suicide timed out
> after 180
>      0> 2014-07-09 20:51:01.169542 7fe0f4617700 -1 common/HeartbeatMap.cc:
> In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7fe0f4617700 time 2014-07-09 20:51:01.163642
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  4: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  5: (()+0x8062) [0x7fe0f797e062]
>  6: (clone()+0x6d) [0x7fe0f62bea3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
> 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal (Aborted) **
>  in thread 7fe0f4617700
>
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fe0f7985880]
>  3: (gsignal()+0x39) [0x7fe0f620e3a9]
>  4: (abort()+0x148) [0x7fe0f62114c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5]
>  6: (()+0x5e746) [0x7fe0f6af9746]
>  7: (()+0x5e773) [0x7fe0f6af9773]
>  8: (()+0x5e9b2) [0x7fe0f6af99b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  13: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  14: (()+0x8062) [0x7fe0f797e062]
>  15: (clone()+0x6d) [0x7fe0f62bea3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>      0> 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fe0f4617700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fe0f7985880]
>  3: (gsignal()+0x39) [0x7fe0f620e3a9]
>  4: (abort()+0x148) [0x7fe0f62114c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5]
>  6: (()+0x5e746) [0x7fe0f6af9746]
>  7: (()+0x5e773) [0x7fe0f6af9773]
>  8: (()+0x5e9b2) [0x7fe0f6af99b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>
>  ...
>
> [Message clipped]
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140812/427d7334/attachment.htm>