I can't really help with MDS. Hopefully somebody else will chime in here. On Tue, Aug 12, 2014 at 12:44 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > Craig, > > Thanks. It turns out one of my memory stick went bad after that power > outage. While trying to fix the OSDs I ran in to many kernel crashes. > After removing that bad memory, I was able to fix them. I did remove all > OSD on that machine and rebuilt it as I didn't trust that data anymore. =P > > I was hoping MDS would come up after that. But it didn't. It shows this > and kills itself. Is this related to 0.82 MDS issue? > 2014-08-12 14:35:11.250634 7ff794bd57c0 0 ceph version 0.80.5 > (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 10244 > 2014-08-12 14:35:11.251092 7ff794bd57c0 1 -- 192.168.1.20:0/0 learned my > addr 192.168.1.20:0/0 > 2014-08-12 14:35:11.251118 7ff794bd57c0 1 accepter.accepter.bind > my_inst.addr is 192.168.1.20:6800/10244 need_addr=0 > 2014-08-12 14:35:11.259207 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 > messenger.start > 2014-08-12 14:35:11.259576 7ff794bd57c0 10 mds.-1.0 168 MDSCacheObject > 2014-08-12 14:35:11.259625 7ff794bd57c0 10 mds.-1.0 2304 CInode > 2014-08-12 14:35:11.259630 7ff794bd57c0 10 mds.-1.0 16 elist<>::item > *7=112 > 2014-08-12 14:35:11.259635 7ff794bd57c0 10 mds.-1.0 480 inode_t > 2014-08-12 14:35:11.259639 7ff794bd57c0 10 mds.-1.0 56 nest_info_t > 2014-08-12 14:35:11.259644 7ff794bd57c0 10 mds.-1.0 32 frag_info_t > 2014-08-12 14:35:11.259648 7ff794bd57c0 10 mds.-1.0 40 SimpleLock > *5=200 > 2014-08-12 14:35:11.259652 7ff794bd57c0 10 mds.-1.0 48 ScatterLock > *3=144 > 2014-08-12 14:35:11.259656 7ff794bd57c0 10 mds.-1.0 488 CDentry > 2014-08-12 14:35:11.259661 7ff794bd57c0 10 mds.-1.0 16 elist<>::item > 2014-08-12 14:35:11.259669 7ff794bd57c0 10 mds.-1.0 40 SimpleLock > 2014-08-12 14:35:11.259674 7ff794bd57c0 10 mds.-1.0 1016 CDir > 2014-08-12 14:35:11.259678 7ff794bd57c0 10 mds.-1.0 16 elist<>::item > *2=32 > 2014-08-12 14:35:11.259682 7ff794bd57c0 10 mds.-1.0 192 fnode_t > 2014-08-12 14:35:11.259687 7ff794bd57c0 10 mds.-1.0 56 nest_info_t *2 > 2014-08-12 14:35:11.259691 7ff794bd57c0 10 mds.-1.0 32 frag_info_t *2 > 2014-08-12 14:35:11.259695 7ff794bd57c0 10 mds.-1.0 176 Capability > 2014-08-12 14:35:11.259699 7ff794bd57c0 10 mds.-1.0 32 xlist<>::item > *2=64 > 2014-08-12 14:35:11.259767 7ff794bd57c0 1 accepter.accepter.start > 2014-08-12 14:35:11.260734 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- auth(proto 0 31 bytes epoch 0) v1 -- ?+0 0x3684000 > con 0x36ac580 > 2014-08-12 14:35:11.261346 7ff794bcd700 10 mds.-1.0 MDS::ms_get_authorizer > type=mon > 2014-08-12 14:35:11.261696 7ff78fe4f700 5 mds.-1.0 ms_handle_connect on > 192.168.1.20:6789/0 > 2014-08-12 14:35:11.262409 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 1 ==== mon_map v1 ==== 194+0+0 (4155369063 0 0) > 0x36d4000 con 0x36ac580 > 2014-08-12 14:35:11.262572 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 > ==== 33+0+0 (2093056952 0 0) 0x3691400 con 0x36ac580 > 2014-08-12 14:35:11.262925 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x3684240 > con 0x36ac580 > 2014-08-12 14:35:11.263643 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 > ==== 206+0+0 (1371651101 0 0) 0x3691800 con 0x36ac580 > 2014-08-12 14:35:11.263807 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 > 0x36846c0 con 0x36ac580 > 2014-08-12 14:35:11.264518 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 > ==== 580+0+0 (1904484134 0 0) 0x3691600 con 0x36ac580 > 2014-08-12 14:35:11.264662 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x36b4380 con > 0x36ac580 > 2014-08-12 14:35:11.264744 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x3684480 > con 0x36ac580 > 2014-08-12 14:35:11.265027 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 5 ==== mon_map v1 ==== 194+0+0 (4155369063 0 0) > 0x36d43c0 con 0x36ac580 > 2014-08-12 14:35:11.265203 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 > (2253672535 0 0) 0x36b4540 con 0x36ac580 > 2014-08-12 14:35:11.265251 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 7 ==== auth_reply(proto 2 0 (0) Success) v1 > ==== 194+0+0 (1999696020 0 0) 0x3691a00 con 0x36ac580 > 2014-08-12 14:35:11.265506 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 > 0x36b41c0 con 0x36ac580 > 2014-08-12 14:35:11.265580 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- mon_subscribe({mdsmap=0+,monmap=2+,osdmap=0}) v2 > -- ?+0 0x36b4a80 con 0x36ac580 > 2014-08-12 14:35:11.266159 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 8 ==== osd_map(9687..9687 src has 9090..9687) > v3 ==== 6983+0+0 (1578463925 0 0) 0x3684b40 con 0x36ac580 > 2014-08-12 14:35:11.266453 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 > (2253672535 0 0) 0x36b41c0 con 0x36ac580 > 2014-08-12 14:35:11.266491 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 <== > mon.0 192.168.1.20:6789/0 10 ==== mdsmap(e 7182) v1 ==== 653+0+0 > (374906493 0 0) 0x3691800 con 0x36ac580 > 2014-08-12 14:35:11.266518 7ff794bd57c0 10 mds.-1.0 beacon_send up:boot > seq 1 (currently up:boot) > 2014-08-12 14:35:11.266585 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 --> > 192.168.1.20:6789/0 -- mdsbeacon(12799/MDS1.1 up:boot seq 1 v0) v2 -- ?+0 > 0x36bc2c0 con 0x36ac580 > 2014-08-12 14:35:11.266626 7ff794bd57c0 10 mds.-1.0 create_logger > 2014-08-12 14:35:11.266677 7ff78fe4f700 5 mds.-1.0 handle_mds_map epoch > 7182 from mon.0 > 2014-08-12 14:35:11.266779 7ff78fe4f700 10 mds.-1.0 my compat > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data} > 2014-08-12 14:35:11.266793 7ff78fe4f700 10 mds.-1.0 mdsmap compat > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} > 2014-08-12 14:35:11.266803 7ff78fe4f700 0 mds.-1.0 handle_mds_map mdsmap > compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} not > writeable with daemon features compat={},rocompat={},incompat={1=base > v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode > in separate object,5=mds uses versioned encoding,6=dirfrag is stored in > omap,7=mds uses inline data}, killing myself > 2014-08-12 14:35:11.266821 7ff78fe4f700 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-08-12 14:35:11.267081 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 > mark_down 0x36ac580 -- 0x36c8500 > 2014-08-12 14:35:11.267204 7ff78fe4f700 1 -- 192.168.1.20:6800/10244 > mark_down_all > 2014-08-12 14:35:11.267612 7ff794bd57c0 1 -- 192.168.1.20:6800/10244 > shutdown complete. > > Regards, > Hong > > > > On Tuesday, July 22, 2014 4:03 PM, Craig Lewis < > clewis at centraldesktop.com> wrote: > > > The osd lost is useful, but not strictly required. It accelerates the > recovery once things are stable. It tells Ceph to give up trying to > recovery data off those disks. Without it, Ceph will still check, then > give up when it can't find it. > > > I was having problems with the suicide timeout at one point. Basically, > the OSDs fail and restart so many times that they can't apply all of the > map changes before they hit the timeout. Sage gave me some suggestions. > Give this a try: > https://www.mail-archive.com/ceph-devel at vger.kernel.org/msg18862.html > > That process solved suicide timeouts, with one caveat. When I followed > it, I filled up /var/log/ceph/ and the recovery failed. I had to manually > run each OSD in debugging mode until it completed the map update. Aside > from that, I followed your procedure. > > I had to run that procedure on all OSDs. I did all OSDs on a node at the > same time. > > > > > > > On Mon, Jul 21, 2014 at 11:45 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > Craig, > > osd.2 was down and out. lost wasn't working.. so skipped it. =P > Formatted the drive XFS and got mostly working but couldn't figure out how > to get the journal to point at my SSD, and init script wasn't able to find > the osd.2 for some reason. So just used ceph-deploy. It created new osd.6 > on the disks that were used for osd.2. I removed norecover and nobackfill > and let the system rebuild. It seemed like it was doing well until it hit > that suicide timeout. What should I do in this case? > > -20> 2014-07-22 01:01:26.087707 7f3a90012700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after 2014-07-22 > 01:00:56.087703) > -19> 2014-07-22 01:01:26.087743 7f3a90012700 10 monclient: renew subs? > (now: 2014-07-22 01:01:26.087742; renew after: 2014-07-22 01:01:16.084357) > -- yes > -18> 2014-07-22 01:01:26.087775 7f3a90012700 10 monclient: renew_subs > -17> 2014-07-22 01:01:26.087793 7f3a90012700 10 monclient: > _send_mon_message to mon.MDS1 at 192.168.1.20:6789/0 > -16> 2014-07-22 01:01:26.087822 7f3a90012700 1 -- > 192.168.1.30:6800/6297 --> 192.168.1.20:6789/0 -- > mon_subscribe({monmap=2+,osd_pg_creates=0}) v2 -- ?+0 0x1442c000 con > 0xf73e2c0 > -15> 2014-07-22 01:01:27.916972 7f3a8c80b700 5 osd.6 3252 heartbeat: > osd_stat(66173 MB used, 1797 GB avail, 1862 GB total, peers [3,4,5]/[] op > hist []) > -14> 2014-07-22 01:01:27.917061 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.2.31:6803/13623 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0x140201c0 con 0xfa68160 > -13> 2014-07-22 01:01:27.917131 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.1.31:6804/13623 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0x17d0b500 con 0xfa68000 > -12> 2014-07-22 01:01:27.917180 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.2.31:6805/13991 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0xfa8fdc0 con 0x19208c60 > -11> 2014-07-22 01:01:27.917229 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.1.31:6807/13991 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0xffcee00 con 0x205c000 > -10> 2014-07-22 01:01:27.917276 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.2.31:6801/13249 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0x224fdc0 con 0xf9f8dc0 > -9> 2014-07-22 01:01:27.917325 7f3a8c80b700 1 -- 192.168.2.30:0/6297 > --> 192.168.1.31:6801/13249 -- osd_ping(ping e3252 stamp 2014-07-22 > 01:01:27.917024) v2 -- ?+0 0xf8ce000 con 0x19208840 > -8> 2014-07-22 01:01:27.918723 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.3 192.168.1.31:6804/13623 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0xffcf500 con > 0xfa68000 > -7> 2014-07-22 01:01:27.918830 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.5 192.168.1.31:6801/13249 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x11a5e700 con > 0x19208840 > -6> 2014-07-22 01:01:27.919218 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.5 192.168.2.31:6801/13249 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0xfa8fa40 con > 0xf9f8dc0 > -5> 2014-07-22 01:01:27.919396 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.3 192.168.2.31:6803/13623 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x1dd8bc00 con > 0xfa68160 > -4> 2014-07-22 01:01:27.919521 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.4 192.168.2.31:6805/13991 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x14021a40 con > 0x19208c60 > -3> 2014-07-22 01:01:27.919606 7f3a9581d700 1 -- 192.168.2.30:0/6297 > <== osd.4 192.168.1.31:6807/13991 28 ==== osd_ping(ping_reply e3252 stamp > 2014-07-22 01:01:27.917024) v2 ==== 47+0+0 (2181285829 0 0) 0x10fdd6c0 con > 0x205c000 > -2> 2014-07-22 01:01:29.976382 7f3aa5c22700 1 heartbeat_map > is_healthy 'FileStore::op_tp thread 0x7f3a9d0b0700' had timed out after 60 > -1> 2014-07-22 01:01:29.976416 7f3aa5c22700 1 heartbeat_map > is_healthy 'FileStore::op_tp thread 0x7f3a9d0b0700' had suicide timed out > after 180 > 0> 2014-07-22 01:01:29.985984 7f3aa5c22700 -1 common/HeartbeatMap.cc: > In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, > const char*, time_t)' thread 7f3aa5c22700 time 2014-07-22 01:01:29.976450 > common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > 4: (CephContextServiceThread::entry()+0x13f) [0xb9911f] > 5: (()+0x8062) [0x7f3aa8f89062] > 6: (clone()+0x6d) [0x7f3aa78c9a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.6.log > --- end dump of recent events --- > 2014-07-22 01:01:30.352843 7f3aa5c22700 -1 *** Caught signal (Aborted) ** > in thread 7f3aa5c22700 > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7f3aa8f90880] > 3: (gsignal()+0x39) [0x7f3aa78193a9] > 4: (abort()+0x148) [0x7f3aa781c4c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3aa81065e5] > 6: (()+0x5e746) [0x7f3aa8104746] > 7: (()+0x5e773) [0x7f3aa8104773] > 8: (()+0x5e9b2) [0x7f3aa81049b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > 13: (CephContextServiceThread::entry()+0x13f) [0xb9911f] > 14: (()+0x8062) [0x7f3aa8f89062] > 15: (clone()+0x6d) [0x7f3aa78c9a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- begin dump of recent events --- > 0> 2014-07-22 01:01:30.352843 7f3aa5c22700 -1 *** Caught signal > (Aborted) ** > in thread 7f3aa5c22700 > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7f3aa8f90880] > 3: (gsignal()+0x39) [0x7f3aa78193a9] > 4: (abort()+0x148) [0x7f3aa781c4c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3aa81065e5] > 6: (()+0x5e746) [0x7f3aa8104746] > 7: (()+0x5e773) [0x7f3aa8104773] > 8: (()+0x5e9b2) [0x7f3aa81049b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > 13: (CephContextServiceThread::entry()+0x13f) [0xb9911f] > 14: (()+0x8062) [0x7f3aa8f89062] > 15: (clone()+0x6d) [0x7f3aa78c9a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.6.log > --- end dump of recent events --- > > Regards, > Hong > > > > On Monday, July 21, 2014 9:35 PM, Craig Lewis <clewis at centraldesktop.com> > wrote: > > > I'd like to get rid of those inconsistent PGs. I think fixing those > will get your MDS working again, but I don't actually know anything about > MDS. Still, it's best to work your way up from the bottom. If the OSDs > aren't stable, there's no use building services on top of them. > > > It's strange that osd.0 was up, but crashed during deep-scrubbing. You > might try disabling deep-scrubs (ceph osd set nodeep-scrub), and see if > osd.0 will stay up. If running without deep-scrubbing will get your > cluster consistent, you can reformat the disk later. > > You said osd.2 fails to start, with a corrupt journal error. There's not > much you can do there. You should remove it again, mark it lost, reformat > the disk, and re-add it to the cluster. > > > I'd rebuild osd.2 first, while leaving osd.0 and osd.1 down. > > Do you have enough disk space that osd.2 can take all of the data from > osd.0 and osd.1? If so, you can mark osd.0 and osd.1 as DOWN and OUT. If > not, make sure that osd.0 and osd.1 are marked DOWN and IN. > > Once osd.2 finishes rebuilding, I'd set noin, then bring osd.0 and osd.1 > up. If they're OUT, that will allow Ceph to copy any unique data they > might have, but it won't try to write anything to them. If they're IN, > well, Ceph will try to write to them. Either way, I'm hoping that they > stay up long enough for you to get 100% consistent. > > > > > > > On Sun, Jul 20, 2014 at 7:01 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > Based on your suggestion here is what I did. > > # ceph osd set nobackfill > set nobackfill > # ceph osd set norecovery > Invalid command: norecovery not in > pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent > osd set > pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent > : set <key> > Error EINVAL: invalid command > # ceph osd set norecover > set norecover > # ceph osd set noin > set noin > # ceph create osd > no valid command found; 10 closest matches: > osd tier remove <poolname> <poolname> > osd tier cache-mode <poolname> none|writeback|forward|readonly > osd thrash <int[0-]> > osd tier add <poolname> <poolname> {--force-nonempty} > osd pool stats {<name>} > osd reweight-by-utilization {<int[100-]>} > osd pool set <poolname> > size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid > <val> {--yes-i-really-mean-it} > osd pool set-quota <poolname> max_objects|max_bytes <val> > osd pool rename <poolname> <poolname> > osd pool get <poolname> > size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid > Error EINVAL: invalid command > # ceph osd create > 0 > # ceph osd create > 1 > # ceph osd create > 2 > # start ceph-osd id=0 > bash: start: command not found > # /etc/init.d/ceph start osd.0 > === osd.0 === > 2014-07-18 21:21:37.207159 7ff2c64d7700 0 librados: osd.0 authentication > error (1) Operation not permitted > Error connecting to cluster: PermissionError > failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 > --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 > 1.82 host=OSD1 root=default' > # ceph status > cluster 9b2c9bca-112e-48b0-86fc-587ef9a52948 > health HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs > stuck unclean; recovery 1484224/3513098 objects degraded (42.248%); 1374 > scrub errors; mds cluster is degraded; mds MDS1 is laggy; > noin,nobackfill,norecover flag(s) set > monmap e1: 1 mons at {MDS1=192.168.1.20:6789/0}, election epoch 1, > quorum 0 MDS1 > mdsmap e7182: 1/1/1 up {0=MDS1=up:replay(laggy or crashed)} > osdmap e3133: 6 osds: 3 up, 3 in > flags noin,nobackfill,norecover > pgmap v309437: 192 pgs, 3 pools, 1571 GB data, 1715 kobjects > 1958 GB used, 3627 GB / 5586 GB avail > 1484224/3513098 objects degraded (42.248%) > 131 active+degraded > 23 active+remapped > 33 active+degraded+inconsistent > 5 active+remapped+inconsistent > # ceph osd stat > osdmap e3133: 6 osds: 3 up, 3 in > flags noin,nobackfill,norecover > # ceph auth get-or-create osd.0 mon 'allow rwx' osd 'allow *' -o > /var/lib/ceph/osd/ceph-0/keyring > > # /etc/init.d/ceph start osd.0 > === osd.0 === > create-or-move updating item name 'osd.0' weight 1.82 at location > {host=OSD1,root=default} to crush map > Starting Ceph osd.0 on OSD1... > starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 > /var/lib/ceph/osd/ceph-0/journal > root at OSD1:/home/genie# ceph auth get-or-create osd.1 mon 'allow rwx' osd > 'allow *' -o /var/lib/ceph/osd/ceph-1/keyring > root at OSD1:/home/genie# ceph auth get-or-create osd.2 mon 'allow rwx' osd > 'allow *' -o /var/lib/ceph/osd/ceph-2/keyring > root at OSD1:/home/genie# /etc/init.d/ceph start osd.1 > === osd.1 === > failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 > --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 > 1.82 host=OSD1 root=default' > # /etc/init.d/ceph start osd.2 > === osd.2 === > create-or-move updating item name 'osd.2' weight 1.82 at location > {host=OSD1,root=default} to crush map > Starting Ceph osd.2 on OSD1... > starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 > /var/lib/ceph/osd/ceph-2/journal > # /etc/init.d/ceph start osd.1 > === osd.1 === > failed: 'timeout 10 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 > --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 > 1.82 host=OSD1 root=default' > # ceph health > Segmentation fault > # ceph health > Bus error > # ceph health > HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs stuck unclean; > recovery 1484224/3513098 objects degraded (42.248%); 1374 scrub errors; mds > cluster is degraded; mds MDS1 is laggy; noin,nobackfill,norecover flag(s) > set > # /etc/init.d/ceph start osd.1 > === osd.1 === > create-or-move updating item name 'osd.1' weight 1.82 at location > {host=OSD1,root=default} to crush map > Starting Ceph osd.1 on OSD1... > starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 > /var/lib/ceph/osd/ceph-1/journal > # ceph -w > cluster 9b2c9bca-112e-48b0-86fc-587ef9a52948 > health HEALTH_ERR 164 pgs degraded; 38 pgs inconsistent; 192 pgs > stuck unclean; recovery 1484224/3513098 objects degraded (42.248%); 1374 > scrub errors; mds cluster is degraded; mds MDS1 is laggy; > noin,nobackfill,norecover flag(s) set > monmap e1: 1 mons at {MDS1=192.168.1.20:6789/0}, election epoch 1, > quorum 0 MDS1 > mdsmap e7182: 1/1/1 up {0=MDS1=up:replay(laggy or crashed)} > osdmap e3137: 6 osds: 4 up, 3 in > flags noin,nobackfill,norecover > pgmap v309463: 192 pgs, 3 pools, 1571 GB data, 1715 kobjects > 1958 GB used, 3627 GB / 5586 GB avail > 1484224/3513098 objects degraded (42.248%) > 131 active+degraded > 23 active+remapped > 33 active+degraded+inconsistent > 5 active+remapped+inconsistent > > 2014-07-19 21:34:59.166709 mon.0 [INF] pgmap v309463: 192 pgs: 131 > active+degraded, 23 active+remapped, 33 active+degraded+inconsistent, 5 > active+remapped+inconsistent; 1571 GB data, 1958 GB used, 3627 GB / 5586 GB > avail; 1484224/3513098 objects degraded (42.248%) > > > osd.2 doesn't come up. osd.1 uses little memory compared to osd.0, but it > stays alive. Killed osd.1 and osd.2 for now. At this point osd.0's CPU > was on and off for a while. But it didn't kill it. So I did ceph osd > unset noin and restarted osd.0. It seemed to be doing something for a long > time. I let it run over night. Found it crashed today. Below is the log > of it. > > > -20> 2014-07-20 00:54:10.924602 7fb562528700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.924244, event: header_read, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -19> 2014-07-20 00:54:10.924652 7fb562528700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.924250, event: throttled, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -18> 2014-07-20 00:54:10.924698 7fb562528700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.924458, event: all_read, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -17> 2014-07-20 00:54:10.924743 7fb562528700 5 -- op tracker -- , seq: > 4847, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 1.29 > 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) > -16> 2014-07-20 00:54:10.924880 7fb54d78f700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.924861, event: reached_pg, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -15> 2014-07-20 00:54:10.924936 7fb54d78f700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.924915, event: started, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -14> 2014-07-20 00:54:10.924974 7fb54d78f700 1 -- > 192.168.2.30:6800/18511 --> 192.168.2.31:6804/13991 -- > osd_sub_op_reply(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] ack, result = > 0) v2 -- ?+1 0x10503680 con 0xfdca000 > -13> 2014-07-20 00:54:10.925053 7fb54d78f700 5 -- op tracker -- , seq: > 4847, time: 2014-07-20 00:54:10.925034, event: done, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-reserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -12> 2014-07-20 00:54:10.926801 7fb562528700 1 -- > 192.168.2.30:6800/18511 <== osd.4 192.168.2.31:6804/13991 1742 ==== > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) v10 ==== 1145+0+0 (2357365982 0 0) 0x1045a100 > con 0xfdca000 > -11> 2014-07-20 00:54:10.926912 7fb562528700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.926624, event: header_read, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -10> 2014-07-20 00:54:10.926961 7fb562528700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.926628, event: throttled, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -9> 2014-07-20 00:54:10.927004 7fb562528700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.926786, event: all_read, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -8> 2014-07-20 00:54:10.927046 7fb562528700 5 -- op tracker -- , seq: > 4848, time: 0.000000, event: dispatched, op: osd_sub_op(unknown.0.0:0 1.29 > 0//0//-1 [scrub-unreserve] v 0'0 snapset=0=[]:[] snapc=0=[]) > -7> 2014-07-20 00:54:10.927179 7fb54df90700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.927160, event: reached_pg, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -6> 2014-07-20 00:54:10.927237 7fb54df90700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.927216, event: started, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -5> 2014-07-20 00:54:10.927289 7fb54df90700 5 -- op tracker -- , seq: > 4848, time: 2014-07-20 00:54:10.927269, event: done, op: > osd_sub_op(unknown.0.0:0 1.29 0//0//-1 [scrub-unreserve] v 0'0 > snapset=0=[]:[] snapc=0=[]) > -4> 2014-07-20 00:54:10.941372 7fb551f98700 1 -- > 192.168.2.30:6801/18511 <== osd.3 192.168.1.31:0/13623 776 ==== > osd_ping(ping e3144 stamp 2014-07-20 00:54:10.942416) v2 ==== 47+0+0 > (216963345 0 0) 0x103c0e00 con 0x1001bce0 > -3> 2014-07-20 00:54:10.941451 7fb551f98700 1 -- > 192.168.2.30:6801/18511 --> 192.168.1.31:0/13623 -- osd_ping(ping_reply > e3144 stamp 2014-07-20 00:54:10.942416) v2 -- ?+0 0x100e2540 con 0x1001bce0 > -2> 2014-07-20 00:54:10.941742 7fb55379b700 1 -- > 192.168.1.30:6801/18511 <== osd.3 192.168.1.31:0/13623 776 ==== > osd_ping(ping e3144 stamp 2014-07-20 00:54:10.942416) v2 ==== 47+0+0 > (216963345 0 0) 0x10547880 con 0xff8db80 > -1> 2014-07-20 00:54:10.941842 7fb55379b700 1 -- > 192.168.1.30:6801/18511 --> 192.168.1.31:0/13623 -- osd_ping(ping_reply > e3144 stamp 2014-07-20 00:54:10.942416) v2 -- ?+0 0x10254a80 con 0xff8db80 > 0> 2014-07-20 00:54:11.646226 7fb54c78d700 -1 os/DBObjectMap.cc: In > function 'virtual bool DBObjectMap::DBObjectMapIteratorImpl::valid()' > thread 7fb54c78d700 time 2014-07-20 00:54:11.640719 > os/DBObjectMap.cc: 399: FAILED assert(!valid || cur_iter->valid()) > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xa72172] > 2: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap::object&, > ThreadPool::TPHandle&)+0x6c3) [0xa2df03] > 3: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, > std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503) > [0x98c523] > 4: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > ThreadPool::TPHandle&)+0x10b) [0x891d4b] > 5: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456) > [0x8925d6] > 6: (OSD::RepScrubWQ::_process(MOSDRepScrub*, > ThreadPool::TPHandle&)+0x10a) [0x7b00fa] > 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a] > 8: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80] > 9: (()+0x8062) [0x7fb56b184062] > 10: (clone()+0x6d) [0x7fb569ac4a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.0.log > --- end dump of recent events --- > 2014-07-20 00:54:11.998700 7fb54c78d700 -1 *** Caught signal (Aborted) ** > in thread 7fb54c78d700 > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7fb56b18b880] > 3: (gsignal()+0x39) [0x7fb569a143a9] > 4: (abort()+0x148) [0x7fb569a174c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb56a3015e5] > 6: (()+0x5e746) [0x7fb56a2ff746] > 7: (()+0x5e773) [0x7fb56a2ff773] > 8: (()+0x5e9b2) [0x7fb56a2ff9b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: /usr/bin/ceph-osd() [0xa72172] > > 11: (ReplicatedBackend::be_deep_scrub(hobject_t const&, > ScrubMap::object&, ThreadPool::TPHandle&)+0x6c3) [0xa2df03] > 12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, > std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503) > [0x98c523] > 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > ThreadPool::TPHandle&)+0x10b) [0x891d4b] > 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456) > [0x8925d6] > 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*, > ThreadPool::TPHandle&)+0x10a) [0x7b00fa] > 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a] > 17: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80] > 18: (()+0x8062) [0x7fb56b184062] > 19: (clone()+0x6d) [0x7fb569ac4a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- begin dump of recent events --- > -1> 2014-07-20 00:54:11.755763 7fb565618700 5 osd.0 3144 tick > 0> 2014-07-20 00:54:11.998700 7fb54c78d700 -1 *** Caught signal > (Aborted) ** > in thread 7fb54c78d700 > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7fb56b18b880] > 3: (gsignal()+0x39) [0x7fb569a143a9] > 4: (abort()+0x148) [0x7fb569a174c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb56a3015e5] > 6: (()+0x5e746) [0x7fb56a2ff746] > 7: (()+0x5e773) [0x7fb56a2ff773] > 8: (()+0x5e9b2) [0x7fb56a2ff9b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: /usr/bin/ceph-osd() [0xa72172] > 11: (ReplicatedBackend::be_deep_scrub(hobject_t const&, > ScrubMap::object&, ThreadPool::TPHandle&)+0x6c3) [0xa2df03] > 12: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, > std::allocator<hobject_t> > const&, bool, ThreadPool::TPHandle&)+0x503) > [0x98c523] > 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > ThreadPool::TPHandle&)+0x10b) [0x891d4b] > 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x456) > [0x8925d6] > 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*, > ThreadPool::TPHandle&)+0x10a) [0x7b00fa] > 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb7792a] > 17: (ThreadPool::WorkThread::entry()+0x10) [0xb78b80] > 18: (()+0x8062) [0x7fb56b184062] > 19: (clone()+0x6d) [0x7fb569ac4a3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.0.log > --- end dump of recent events --- > > > What can I do about this one? > > Regards, > Hong > > > > > On Friday, July 18, 2014 5:16 PM, Craig Lewis < > clewis at centraldesktop.com> wrote: > > > That I can't help you with. I'm a pure RadosGW user. But OSD stability > affects everybody. :-P > > > On Fri, Jul 18, 2014 at 2:34 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > Thanks Craig. I will try this soon. BTW should I upgrade to 0.80.4 > first? The MDS journal issue seems to be one of the issue I am running > into. > > Regards, > Hong > > > On Friday, July 18, 2014 4:14 PM, Craig Lewis <clewis at centraldesktop.com> > wrote: > > > If osd.3, osd.4, and osd.5 are stable, your cluster should be working > again. What does ceph status say? > > > I was able to re-add removed osd. > Here's what I did on my dev cluster: > stop ceph-osd id=0 > ceph osd down 0 > ceph osd out 0 > ceph osd rm 0 > ceph osd crush rm osd.0 > > Now my osd tree and osd dump do not show osd.0. The cluster was degraded, > but did not do any backfilling because I require 3x replication on 3 > different hosts, and Ceph can't satisfy that with 2 osds. > > On the same host, I ran: > ceph osd create # Returned ID 0 > start ceph-osd id=0 > > > osd.0 started up and joined the cluster. Once peering completed, all of > the PGs recovered quickly. I didn't have any writes on the cluster while I > was doing this. > > So it looks like you can just re-create and start those deleted osds. > > > > In your situation, I would do the following. Before you start, go through > this, and make sure you understand all the steps. Worst case, you can > always undo this by removing the osds again, and you'll be back to where > you are now. > > ceph osd set nobackfill > ceph osd set norecovery > ceph osd set noin > ceph create osd # Should return 0. Abort if it doesn't. > ceph create osd # Should return 1. Abort if it doesn't. > ceph create osd # Should return 2. Abort if it doesn't. > start ceph-osd id=0 > > Watch ceph -w and top. Hopefully ceph-osd id=0 will use some CPU, then > go UP, and drop to 0% cpu. If so, > ceph osd unset noin > restart ceph-osd id=0 > > Now osd.0 should go UP and IN, use some CPU for a while, then drop to 0% > cpu. If osd.0 drops out now, set noout, and shut it down. > > set noin again, and start osd.1. When it's stable, do it again for osd.2. > > Once as many as possible are up and stable: > ceph osd unset nobackfill > ceph osd unset norecovery > > Now it should start recovering. If your osds start dropping out now, set > noout, and shut down the ones that are having problems. > > > The goal is to get all the stable osds up, in, and recovered. Once that's > done, we can figure out what to do with the unstable osds. > > > > > > > > > > On Thu, Jul 17, 2014 at 9:29 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > Sorry Craig. I thought I sent both but second part didn't copy right. > For some reason over night MDS and MON decided to stop so I started it > when I was running those commands. Interestingly MDS didn't fail at the > time like it used to. So I thought something was being fixed? Then I now > realize MDS probably couldn't get to the data because OSD were down. Now > that I brought up the OSDs MDS crashed again. =P > > $ ceph osd tree > # id weight type name up/down reweight > -1 5.46 root default > -2 0 host OSD1 > -3 5.46 host OSD2 > 3 1.82 osd.3 up 1 > 4 1.82 osd.4 up 1 > 5 1.82 osd.5 up 1 > > $ ceph osd dump > epoch 3125 > fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948 > created 2014-02-08 01:57:34.086532 > modified 2014-07-17 23:24:10.823596 > flags > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 > stripe_width 0 > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > max_osd 6 > osd.3 up in weight 1 up_from 3120 up_thru 3122 down_at 3116 > last_clean_interval [2858,3113) 192.168.1.31:6803/13623 > 192.168.2.31:6802/13623 192.168.2.31:6803/13623 192.168.1.31:6804/13623 > exists,up 4f86a418-6c67-4cb4-83a1-6c123c890036 > osd.4 up in weight 1 up_from 3121 up_thru 3122 down_at 3116 > last_clean_interval [2859,3113) 192.168.1.31:6806/13991 > 192.168.2.31:6804/13991 192.168.2.31:6805/13991 192.168.1.31:6807/13991 > exists,up 3d5e3843-7a47-44b0-b276-61c4b1d62900 > osd.5 up in weight 1 up_from 3118 up_thru 3118 down_at 3116 > last_clean_interval [2856,3113) 192.168.1.31:6800/13249 > 192.168.2.31:6800/13249 192.168.2.31:6801/13249 192.168.1.31:6801/13249 > exists,up eec86483-2f35-48a4-a154-2eaf26be06b9 > pg_temp 0.2 [4,3] > pg_temp 0.a [4,5] > pg_temp 0.c [3,4] > pg_temp 0.10 [3,4] > pg_temp 0.15 [3,5] > pg_temp 0.17 [3,5] > pg_temp 0.2f [4,5] > pg_temp 0.3b [4,3] > pg_temp 0.3c [3,5] > pg_temp 0.3d [4,5] > pg_temp 1.1 [4,3] > pg_temp 1.9 [4,5] > pg_temp 1.b [3,4] > pg_temp 1.14 [3,5] > pg_temp 1.16 [3,5] > pg_temp 1.2e [4,5] > pg_temp 1.3a [4,3] > pg_temp 1.3b [3,5] > pg_temp 1.3c [4,5] > pg_temp 2.0 [4,3] > pg_temp 2.8 [4,5] > pg_temp 2.a [3,4] > pg_temp 2.13 [3,5] > pg_temp 2.15 [3,5] > pg_temp 2.2d [4,5] > pg_temp 2.39 [4,3] > pg_temp 2.3a [3,5] > pg_temp 2.3b [4,5] > blacklist 192.168.1.20:6802/30894 expires 2014-07-17 23:48:10.823576 > blacklist 192.168.1.20:6801/30651 expires 2014-07-17 23:47:55.562984 > > Regards, > Hong > > > On Thursday, July 17, 2014 3:30 PM, Craig Lewis < > clewis at centraldesktop.com> wrote: > > > You gave me 'ceph osd dump' twice.... can I see 'ceph osd tree' too? > > Why are osd.3, osd.4, and osd.5 down? > > > On Thu, Jul 17, 2014 at 11:45 AM, hjcho616 <hjcho616 at yahoo.com> wrote: > > Thank you for looking at this. Below are the outputs you requested. > > # ceph osd dump > epoch 3117 > fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948 > created 2014-02-08 01:57:34.086532 > modified 2014-07-16 22:13:04.385914 > flags > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 > stripe_width 0 > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > max_osd 6 > osd.3 down in weight 1 up_from 2858 up_thru 3040 down_at 3116 > last_clean_interval [2830,2851) 192.168.1.31:6803/5127 > 192.168.2.31:6805/5127 192.168.2.31:6806/5127 192.168.1.31:6805/5127 > exists 4f86a418-6c67-4cb4-83a1-6c123c890036 > osd.4 down in weight 1 up_from 2859 up_thru 3043 down_at 3116 > last_clean_interval [2835,2849) 192.168.1.31:6807/5310 > 192.168.2.31:6807/5310 192.168.2.31:6808/5310 192.168.1.31:6808/5310 > exists 3d5e3843-7a47-44b0-b276-61c4b1d62900 > osd.5 down in weight 1 up_from 2856 up_thru 3042 down_at 3116 > last_clean_interval [2837,2853) 192.168.1.31:6800/4969 > 192.168.2.31:6801/4969 192.168.2.31:6804/4969 192.168.1.31:6801/4969 > exists eec86483-2f35-48a4-a154-2eaf26be06b9 > > # ceph osd dump > epoch 3117 > fsid 9b2c9bca-112e-48b0-86fc-587ef9a52948 > created 2014-02-08 01:57:34.086532 > modified 2014-07-16 22:13:04.385914 > flags > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 > stripe_width 0 > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > max_osd 6 > osd.3 down in weight 1 up_from 2858 up_thru 3040 down_at 3116 > last_clean_interval [2830,2851) 192.168.1.31:6803/5127 > 192.168.2.31:6805/5127 192.168.2.31:6806/5127 192.168.1.31:6805/5127 > exists 4f86a418-6c67-4cb4-83a1-6c123c890036 > osd.4 down in weight 1 up_from 2859 up_thru 3043 down_at 3116 > last_clean_interval [2835,2849) 192.168.1.31:6807/5310 > 192.168.2.31:6807/5310 192.168.2.31:6808/5310 192.168.1.31:6808/5310 > exists 3d5e3843-7a47-44b0-b276-61c4b1d62900 > osd.5 down in weight 1 up_from 2856 up_thru 3042 down_at 3116 > last_clean_interval [2837,2853) 192.168.1.31:6800/4969 > 192.168.2.31:6801/4969 192.168.2.31:6804/4969 192.168.1.31:6801/4969 > exists eec86483-2f35-48a4-a154-2eaf26be06b9 > > Regards, > Hong > > > > On Thursday, July 17, 2014 12:02 PM, Craig Lewis < > clewis at centraldesktop.com> wrote: > > > I don't believe you can re-add an OSD after `ceph osd rm`, but it's worth > a shot. Let me see what I can do on my dev cluster. > > What does `ceph osd dump` and `ceph osd tree` say? I want to make sure > I'm starting from the same point you are. > > > > On Wed, Jul 16, 2014 at 7:39 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > I did a "ceph osd rm" for all three but I didn't do anything else to it > afterwards. Can this be added back? > > Regards, > Hong > > > On Wednesday, July 16, 2014 6:54 PM, Craig Lewis < > clewis at centraldesktop.com> wrote: > > > For some reason you ended up in my spam folder. That might be why you > didn't get any responses. > > > Have you destroyed osd.0, osd.1, and osd.2? If not, try bringing them up > one a time. You might have just one bad disk, which is much better than > 50% of your disks. > > How is the ceph-osd process behaving when it hits the suicide timeout? I > had some problems a while back where the ceph-osd process would startup, > start consuming ~200% CPU for a while, then get stuck using almost exactly > 100% CPU. It would get kicked out of the cluster for being unresponsive, > then suicide. Repeat. If that's happening here, I can suggest some things > to try. > > > > > > On Fri, Jul 11, 2014 at 9:12 PM, hjcho616 <hjcho616 at yahoo.com> wrote: > > I have 2 OSD machines with 3 OSD running on each. One MDS server with 3 > daemons running. Ran cephfs mostly on 0.78. One night we lost power for > split second. MDS1 and OSD2 went down, OSD1 seemed OK, well turns out OSD1 > suffered most. Those two machines rebooted and seemed ok except it had > some inconsistencies. I waited for a while, didn't fix itself. So I > issued 'ceph pg repair pgnum'. It would try some and some OSD would crash. > Tried this for multiple days. Got some PGs fixed... but mostly it would > crash an OSD and stop recovering. dmesg shows something like below. > > > > [ 740.059498] traps: ceph-osd[5279] general protection ip:7f84e75ec75e > sp:7fff00045bc0 error:0 in libtcmalloc.so.4.1.0[7f84e75b3000+4a000] > > and ceph osd log shows something like this. > > -2> 2014-07-09 20:51:01.163571 7fe0f4617700 1 heartbeat_map > is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had timed out after 60 > -1> 2014-07-09 20:51:01.163609 7fe0f4617700 1 heartbeat_map > is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had suicide timed out > after 180 > 0> 2014-07-09 20:51:01.169542 7fe0f4617700 -1 common/HeartbeatMap.cc: > In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, > const char*, time_t)' thread 7fe0f4617700 time 2014-07-09 20:51:01.163642 > common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > 4: (CephContextServiceThread::entry()+0x13f) [0xb9911f] > 5: (()+0x8062) [0x7fe0f797e062] > 6: (clone()+0x6d) [0x7fe0f62bea3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 keyvaluestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.0.log > --- end dump of recent events --- > 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal (Aborted) ** > in thread 7fe0f4617700 > > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7fe0f7985880] > 3: (gsignal()+0x39) [0x7fe0f620e3a9] > 4: (abort()+0x148) [0x7fe0f62114c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5] > 6: (()+0x5e746) [0x7fe0f6af9746] > 7: (()+0x5e773) [0x7fe0f6af9773] > 8: (()+0x5e9b2) [0x7fe0f6af99b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > 13: (CephContextServiceThread::entry()+0x13f) [0xb9911f] > 14: (()+0x8062) [0x7fe0f797e062] > 15: (clone()+0x6d) [0x7fe0f62bea3d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- begin dump of recent events --- > 0> 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal > (Aborted) ** > in thread 7fe0f4617700 > > ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) > 1: /usr/bin/ceph-osd() [0xaac562] > 2: (()+0xf880) [0x7fe0f7985880] > 3: (gsignal()+0x39) [0x7fe0f620e3a9] > 4: (abort()+0x148) [0x7fe0f62114c8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5] > 6: (()+0x5e746) [0x7fe0f6af9746] > 7: (()+0x5e773) [0x7fe0f6af9773] > 8: (()+0x5e9b2) [0x7fe0f6af99b2] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x40a) [0xb85b6a] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x2eb) [0xad2cbb] > 11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6] > 12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8] > > ... > > [Message clipped] > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140812/427d7334/attachment.htm>