Re: Another OSD broken today. How can I recover it?

Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> · Mon, 4 Dec 2017 10:22:58 +0100

    Hello, 

    Things are going worse
        every day. 

    ceph -w

            cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771

             health HEALTH_ERR

                    1 pgs are stuck inactive for more than 300 seconds

                    8 pgs inconsistent

                    1 pgs repair

                    1 pgs stale

                    1 pgs stuck stale

                    recovery 20266198323167232/288980 objects degraded
        (7013010700798.405%)

                    37154696925806624 scrub errors

                    no legacy OSD present but 'sortbitwise' flag is not
        set

    But I'm finally finding
        time to recover. The disk seems to be correct, no smart errors
        and everything looks fine just ceph not starting. Today I
        started to look for the ceph-objectstore-tool. That I don't
        really know much. 

    It just works nice. No
        crash as expected like on the OSD. 

    So I'm lost. Since both
        OSD and ceph objectstore tool use same backend how is this
        posible?
    Can someone help me on
        fixing this, please?

      ----------------------------------------------------------------------------------
    ceph-objectstore-tool
      --debug --op list-pgs --data-path /var/lib/ceph/osd/ceph-4
      --journal-path /dev/sdf3

      2017-12-03 13:27:58.206069 7f02c203aa40  0
      filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342)

      2017-12-03 13:27:58.206528 7f02c203aa40  0
      genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
      FIEMAP ioctl is disabled via 'filestore fiemap' config option

      2017-12-03 13:27:58.206546 7f02c203aa40  0
      genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
      SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole'
      config option

      2017-12-03 13:27:58.206569 7f02c203aa40  0
      genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
      splice is supported

      2017-12-03 13:27:58.251393 7f02c203aa40  0
      genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
      syncfs(2) syscall fully supported (by glibc and kernel)

      2017-12-03 13:27:58.251459 7f02c203aa40  0
      xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
      extsize is disabled by conf

      2017-12-03 13:27:58.978809 7f02c203aa40  0
      filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD
      journal mode: checkpoint is not enabled

      2017-12-03 13:27:58.990051 7f02c203aa40  1 journal _open /dev/sdf3
      fd 11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio
      = 1

      2017-12-03 13:27:59.002345 7f02c203aa40  1 journal _open /dev/sdf3
      fd 11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio
      = 1

      2017-12-03 13:27:59.004846 7f02c203aa40  1
      filestore(/var/lib/ceph/osd/ceph-4) upgrade

      Cluster fsid=9028f4da-0d77-462b-be9b-dbdf7fa57771

      Supported features: compat={},rocompat={},incompat={1=initial
      feature set(~v.18),2=pginfo object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
      objects,12=transaction hints,13=pg meta object}

      On-disk features: compat={},rocompat={},incompat={1=initial
      feature set(~v.18),2=pginfo object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
      objects,12=transaction hints,13=pg meta object}

      Performing list-pgs operation

      11.7f

      10.4b

      ....

      10.8d

      2017-12-03 13:27:59.009327 7f02c203aa40  1 journal close /dev/sdf3

    It looks like the
        problem has something to do with map. cause there's an assertion
        that's failing on size. 

    Can this have something
        to do with the fact I got this from map?
          pgmap v71223952:
        764 pgs, 6 pools, 561 GB data, 141 kobjects

                    1124 GB used, 1514 GB / 2639 GB avail

                    20266198323167232/288980 objects degraded
        (7013010700798.405%)

    This is the current
        crash from the command line.

    starting osd.4 at :/0
      osd_data /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal

      osd/PG.cc: In function 'static int
      PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
      ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03
      13:39:29.495311

      osd/PG.cc: 3025: FAILED assert(values.size() == 2)

       ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)

       1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
      const*)+0x80) [0x5556eab28790]

       2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
      ceph::buffer::list*)+0x661) [0x5556ea4e6601]

       3: (OSD::load_pgs()+0x75a) [0x5556ea43a8aa]

       4: (OSD::init()+0x2026) [0x5556ea445ca6]

       5: (main()+0x2ef1) [0x5556ea3b7301]

       6: (__libc_start_main()+0xf0) [0x7f467886b830]

       7: (_start()+0x29) [0x5556ea3f8b09]

       NOTE: a copy of the executable, or `objdump -rdS
      <executable>` is needed to interpret this.

      2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In function
      'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
      ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03
      13:39:29.495311

      osd/PG.cc: 3025: FAILED assert(values.size() == 2)

      So it looks like the offending code is this one:

        int r = store->omap_get_values(coll, pgmeta_oid, keys,
      &values);

        if (r == 0) {

          assert(values.size() == 2);     <------ Here

          // sanity check version

      How can this values be different of 2. Can this have something to
      do with the map values showing in ceph?

            pgmap v71223952: 764 pgs, 6 pools, 561 GB data, 141 kobjects

                  1124 GB used, 1514 GB / 2639 GB avail

                  20266198323167232/288980 objects degraded
      (7013010700798.405%)

      Best regards

    On 03/12/17 13:31, Gonzalo Aguilar
      Delgado wrote:

      Hi, 

      Yes. Nice. Until all
          your OSD fails and you don't know what else to try. Looking at
          the faillure rates it will happen very soon. 

      I want to recover
          them. I'm writing in another mail what I tried. Let see if
          someone can help me. 

      I'm not doing
          anything. Just looking at my cluster from time to time to find
          that something else failed. I will do hard to recover this
          situation. 

      Thank you. 

      On 26/11/17 16:13, Marc Roos wrote:

If I am not mistaken, the whole idea with the 3 replica's is dat you 
have enough copies to recover from a failed osd. In my tests this seems 
to go fine automatically. Are you doing something that is not adviced?

-----Original Message-----
From: Gonzalo Aguilar Delgado [mailto:gaguilar@xxxxxxxxxxxxxxxxxx] 
Sent: zaterdag 25 november 2017 20:44
To: 'ceph-users'
Subject:  Another OSD broken today. How can I recover it?

Hello, 

I had another blackout with ceph today. It seems that ceph osd's fall 
from time to time and they are unable to recover. I have 3 OSD's down 
now. 1 removed from the cluster and 2 down because I'm unable to recover 
them. 

We really need a recovery tool. It's not normal that an OSD breaks and 
there's no way to recover. Is there any way to do it?

Last one shows this:

] enter Reset
   -12> 2017-11-25 20:34:19.548891 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[0.34(unlocked)] enter Initial
   -11> 2017-11-25 20:34:19.548983 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
exit Initial 0.000091 0 0.000000
   -10> 2017-11-25 20:34:19.548994 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
enter Reset
    -9> 2017-11-25 20:34:19.549166 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[10.36(unlocked)] enter Initial
    -8> 2017-11-25 20:34:19.566781 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
0.017614 0 0.000000
    -7> 2017-11-25 20:34:19.566811 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
    -6> 2017-11-25 20:34:19.585411 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[8.5c(unlocked)] enter Initial
    -5> 2017-11-25 20:34:19.602888 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
exit Initial 0.017478 0 0.000000
    -4> 2017-11-25 20:34:19.602912 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
enter Reset
    -3> 2017-11-25 20:34:19.603082 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[9.10(unlocked)] enter Initial
    -2> 2017-11-25 20:34:19.615456 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
0.012373 0 0.000000
    -1> 2017-11-25 20:34:19.615481 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
     0> 2017-11-25 20:34:19.617400 7f6e5dc158c0 -1 osd/PG.cc: In 
function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, 
ceph::bufferlist*)' thread 7f6e5dc158c0 time 2017-11-25 20:34:19.615633
osd/PG.cc: 3025: FAILED assert(values.size() == 2)

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x80) [0x5562d318d790]
 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
ceph::buffer::list*)+0x661) [0x5562d2b4b601]
 3: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
 4: (OSD::init()+0x2026) [0x5562d2aaaca6]
 5: (main()+0x2ef1) [0x5562d2a1c301]
 6: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
 7: (_start()+0x29) [0x5562d2a5db09]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.4.log
--- end dump of recent events ---
2017-11-25 20:34:19.622559 7f6e5dc158c0 -1 *** Caught signal (Aborted) 
**  in thread 7f6e5dc158c0 thread_name:ceph-osd

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (()+0x98653e) [0x5562d308d53e]
 2: (()+0x11390) [0x7f6e5caee390]
 3: (gsignal()+0x38) [0x7f6e5aa8a428]
 4: (abort()+0x16a) [0x7f6e5aa8c02a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x26b) [0x5562d318d97b]
 6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
ceph::buffer::list*)+0x661) [0x5562d2b4b601]
 7: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
 8: (OSD::init()+0x2026) [0x5562d2aaaca6]
 9: (main()+0x2ef1) [0x5562d2a1c301]
 10: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
 11: (_start()+0x29) [0x5562d2a5db09]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- begin dump of recent events ---
     0> 2017-11-25 20:34:19.622559 7f6e5dc158c0 -1 *** Caught signal 
(Aborted) **  in thread 7f6e5dc158c0 thread_name:ceph-osd

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (()+0x98653e) [0x5562d308d53e]
 2: (()+0x11390) [0x7f6e5caee390]
 3: (gsignal()+0x38) [0x7f6e5aa8a428]
 4: (abort()+0x16a) [0x7f6e5aa8c02a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x26b) [0x5562d318d97b]
 6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
ceph::buffer::list*)+0x661) [0x5562d2b4b601]
 7: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
 8: (OSD::init()+0x2026) [0x5562d2aaaca6]
 9: (main()+0x2ef1) [0x5562d2a1c301]
 10: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
 11: (_start()+0x29) [0x5562d2a5db09]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.4.log

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com