Re: ceph pg query hangs for ever

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Fri, 1 Apr 2016 10:28:16 +1100

    Hi Mart, Wido...

    A disclaimer: Not really an expert, just a regular site admin
    sharing my experience.

    At the beginning of the thread you give the idea that only osd.68
    has problems dealing with the problematic PG 3.117. If that is
    indeed the case, you could simply mark that osd.68 down and remove
    it from the cluster. This will trigger Ceph to replicate all PGs in
    osd.68 to other osds based on other PG replicas.

    However, In the last email, you seem to give the idea that it is the
    PG 3.117 which has problems, which makes all osds which share that
    PG also problematic.  Because of that you marked all osds sharing
    that PG as down. 

    Before actually trying something more drastic, I would go for a more
    classic approach. For example, what happens if you turn only one osd
    up? I would start with osd.74 since you suspect of problems in
    osd.68 and osd.55 was the reason for the dump message bellow. If it
    still aborts than it means that the PG might have been replicated
    everywhere with 'bad' data. 

    The drastic approach (If you do not care about data on that PG), is
    to mark those osds has down, and force that PG to be recreated using
    'ceph pg force_create_pg 3.117'. Based on my previous experience,
    once I've recreated a PG, 'ceph pg dump_stuck stale' showed that PG
    is creating state forever. To make it right, I had to restart the
    proper osds. But, as you stated, you then have to deal with data
    corruption at the VMs level... Maybe that is a problem, maybe it
    isn't...

    Hope that helps

    Cheers

    Goncalo

    On 03/31/2016 12:26 PM, Mart van Santen
      wrote:

      Hello,

      Well unfortunately the problem is not really solved. Yes, we
      managed to get to a good health state at some point, when a client
      hits some specific data, the osd process crashes with below
      errors. The 3 OSD which handle 3.117, the PG with problems, are
      currently down and reweighted them to 0, so non-affected PGs are
      currently rebuild on other OSDs

      If I put them crashed osd up, the do crash again within a few
      minutes. 

      As I'm a bit afraid for the data in this PG, I think we want to
      recreate the PG with empty data and discard the old disks. I
      understand I will get datacorruption on serveral RBDs in this
      case, but we will try to solve that and rebuild the affected VMs.
      Does this makes sense and what are the best next steps?

      Regards,

      Mart

      works

         -34> 2016-03-31 03:07:56.932800 7f8e43829700  3 osd.55
      122203 handle_osd_map epochs [122203,122203], i have 122203, src
      has [120245,122203]

         -33> 2016-03-31 03:07:56.932837 7f8e43829700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 <== osd.45
      [2a00:c6c0:0:122::103]:6800/1852 7 ==== pg_info(1 pgs
      e122202:3.117) v4 ==== 919+0+0 (3389909573 0 0) 0x528bc00 con
      0x1200a840

         -32> 2016-03-31 03:07:56.932855 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932770, event:
      header_read, op: pg_info(1 pgs e122202:3.117)

         -31> 2016-03-31 03:07:56.932869 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932771, event:
      throttled, op: pg_info(1 pgs e122202:3.117)

         -30> 2016-03-31 03:07:56.932878 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932822, event:
      all_read, op: pg_info(1 pgs e122202:3.117)

         -29> 2016-03-31 03:07:56.932886 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932851, event:
      dispatched, op: pg_info(1 pgs e122202:3.117)

         -28> 2016-03-31 03:07:56.932895 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932895, event:
      waiting_for_osdmap, op: pg_info(1 pgs e122202:3.117)

         -27> 2016-03-31 03:07:56.932912 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932912, event:
      started, op: pg_info(1 pgs e122202:3.117)

         -26> 2016-03-31 03:07:56.932947 7f8e43829700  5 -- op
      tracker -- seq: 22, time: 2016-03-31 03:07:56.932947, event: done,
      op: pg_info(1 pgs e122202:3.117)

         -25> 2016-03-31 03:07:56.933022 7f8e3c01a700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::103]:6800/1852 -- osd_map(122203..122203 src has
      121489..122203) v3 -- ?+0 0x11c7fd40 con 0x1200a840

         -24> 2016-03-31 03:07:56.933041 7f8e3c01a700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::103]:6800/1852 -- pg_info(1 pgs e122203:3.117)
      v4 -- ?+0 0x528bde0 con 0x1200a840

         -23> 2016-03-31 03:07:56.933111 7f8e3c01a700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::105]:6810/3568 -- osd_map(122203..122203 src has
      121489..122203) v3 -- ?+0 0x12200d00 con 0x1209d4a0

         -22> 2016-03-31 03:07:56.933125 7f8e3c01a700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::105]:6810/3568 -- pg_info(1 pgs e122203:3.117)
      v4 -- ?+0 0x5288960 con 0x1209d4a0

         -21> 2016-03-31 03:07:56.933154 7f8e3c01a700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::108]:6816/1002847 -- pg_info(1 pgs
      e122203:3.117) v4 -- ?+0 0x5288d20 con 0x101a19c0

         -20> 2016-03-31 03:07:56.933212 7f8e3c01a700  5 osd.55
      pg_epoch: 122203 pg[3.117( v 122193'1898519
      (108032'1895437,122193'1898519] local-les=122202 n=2789 ec=23736
      les/c 122202/122047 122062/122201/122201) [72,54,45]/[55] r=0
      lpr=122201 pi=122046-122200/51 bft=45,54,72 crt=122133'1898514
      lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped] on
      activate: bft=45,54,72 from 0//0//-1

         -19> 2016-03-31 03:07:56.933232 7f8e3c01a700  5 osd.55
      pg_epoch: 122203 pg[3.117( v 122193'1898519
      (108032'1895437,122193'1898519] local-les=122202 n=2789 ec=23736
      les/c 122202/122047 122062/122201/122201) [72,54,45]/[55] r=0
      lpr=122201 pi=122046-122200/51 bft=45,54,72 crt=122133'1898514
      lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped] target
      shard 45 from 0//0//-1

         -18> 2016-03-31 03:07:56.933244 7f8e3c01a700  5 osd.55
      pg_epoch: 122203 pg[3.117( v 122193'1898519
      (108032'1895437,122193'1898519] local-les=122202 n=2789 ec=23736
      les/c 122202/122047 122062/122201/122201) [72,54,45]/[55] r=0
      lpr=122201 pi=122046-122200/51 bft=45,54,72 crt=122133'1898514
      lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped] target
      shard 54 from 0//0//-1

         -17> 2016-03-31 03:07:56.933255 7f8e3c01a700  5 osd.55
      pg_epoch: 122203 pg[3.117( v 122193'1898519
      (108032'1895437,122193'1898519] local-les=122202 n=2789 ec=23736
      les/c 122202/122047 122062/122201/122201) [72,54,45]/[55] r=0
      lpr=122201 pi=122046-122200/51 bft=45,54,72 crt=122133'1898514
      lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped] target
      shard 72 from 0//0//-1

         -16> 2016-03-31 03:07:56.933283 7f8e3680f700  5 -- op
      tracker -- seq: 20, time: 2016-03-31 03:07:56.933283, event:
      reached_pg, op: osd_op(client.776466.1:190178605
      rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint
      object_size 4194304 write_size 4194304,write 8192~8192] 3.b1492517
      RETRY=1 snapc 8b3=[8b3] ondisk+retry+write e122203)

         -15> 2016-03-31 03:07:56.933507 7f8e3680f700  5 -- op
      tracker -- seq: 20, time: 2016-03-31 03:07:56.933507, event:
      started, op: osd_op(client.776466.1:190178605
      rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint
      object_size 4194304 write_size 4194304,write 8192~8192] 3.b1492517
      RETRY=1 snapc 8b3=[8b3] ondisk+retry+write e122203)

         -14> 2016-03-31 03:07:56.933648 7f8e3680f700  5 -- op
      tracker -- seq: 20, time: 2016-03-31 03:07:56.933648, event:
      waiting for subops from 45,54,72, op:
      osd_op(client.776466.1:190178605
      rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint
      object_size 4194304 write_size 4194304,write 8192~8192] 3.b1492517
      RETRY=1 snapc 8b3=[8b3] ondisk+retry+write e122203)

         -13> 2016-03-31 03:07:56.933682 7f8e3680f700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::103]:6800/1852 --
      osd_repop(client.776466.1:190178605 3.117
      b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v
      122203'1898521) v1 -- ?+46 0x11e96400 con 0x1200a840

         -12> 2016-03-31 03:07:56.933712 7f8e3680f700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::105]:6810/3568 --
      osd_repop(client.776466.1:190178605 3.117
      b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v
      122203'1898521) v1 -- ?+46 0x11e96a00 con 0x1209d4a0

         -11> 2016-03-31 03:07:56.933735 7f8e3680f700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 -->
      [2a00:c6c0:0:122::108]:6816/1002847 --
      osd_repop(client.776466.1:190178605 3.117
      b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v
      122203'1898521) v1 -- ?+46 0x11e97600 con 0x101a19c0

         -10> 2016-03-31 03:07:56.935173 7f8e30ef5700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 <== osd.72
      [2a00:c6c0:0:122::108]:6816/1002847 9 ====
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0) v1 ==== 83+0+0 (405786713 0 0) 0x11e66d00 con 0x101a19c0

          -9> 2016-03-31 03:07:56.935212 7f8e30ef5700  5 -- op
      tracker -- seq: 23, time: 2016-03-31 03:07:56.935087, event:
      header_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -8> 2016-03-31 03:07:56.935224 7f8e30ef5700  5 -- op
      tracker -- seq: 23, time: 2016-03-31 03:07:56.935090, event:
      throttled, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -7> 2016-03-31 03:07:56.935234 7f8e30ef5700  5 -- op
      tracker -- seq: 23, time: 2016-03-31 03:07:56.935162, event:
      all_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -6> 2016-03-31 03:07:56.935245 7f8e30ef5700  5 -- op
      tracker -- seq: 23, time: 0.000000, event: dispatched, op:
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0)

          -5> 2016-03-31 03:07:56.936129 7f8e2dfc6700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 <== osd.45
      [2a00:c6c0:0:122::103]:6800/1852 8 ====
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0) v1 ==== 83+0+0 (3967999676 0 0) 0x11c7fd40 con 0x1200a840

          -4> 2016-03-31 03:07:56.936150 7f8e2dfc6700  5 -- op
      tracker -- seq: 24, time: 2016-03-31 03:07:56.936086, event:
      header_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -3> 2016-03-31 03:07:56.936159 7f8e2dfc6700  5 -- op
      tracker -- seq: 24, time: 2016-03-31 03:07:56.936087, event:
      throttled, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -2> 2016-03-31 03:07:56.936166 7f8e2dfc6700  5 -- op
      tracker -- seq: 24, time: 2016-03-31 03:07:56.936124, event:
      all_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -1> 2016-03-31 03:07:56.936172 7f8e2dfc6700  5 -- op
      tracker -- seq: 24, time: 0.000000, event: dispatched, op:
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0)

           0> 2016-03-31 03:07:56.940165 7f8e3680f700 -1
      osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const
      hobject_t&, const std::set<snapid_t>&,
      MapCacher::Transaction<std::basic_string<char>,
      ceph::buffer::list>*)' thread 7f8e3680f700 time 2016-03-31
      03:07:56.933983

      osd/SnapMapper.cc: 228: FAILED assert(r == -2)

       ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

       1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
      const*)+0x8b) [0xba8b8b]

       2: (SnapMapper::add_oid(hobject_t const&,
      std::set<snapid_t, std::less<snapid_t>,
      std::allocator<snapid_t> > const&,
      MapCacher::Transaction<std::string,
      ceph::buffer::list>*)+0x61e) [0x72137e]

       3: (PG::update_snap_map(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      ObjectStore::Transaction&)+0x402) [0x7d25c2]

       4: (PG::append_log(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&, eversion_t,
      eversion_t, ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]

       5: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, eversion_t
      const&, eversion_t const&, bool,
      ObjectStore::Transaction*)+0xba) [0x899eca]

       6: (ReplicatedBackend::submit_transaction(hobject_t const&,
      eversion_t const&, PGBackend::PGTransaction*, eversion_t
      const&, eversion_t const&, std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, Context*,
      Context*, Context*, unsigned long, osd_reqid_t,
      std::tr1::shared_ptr<OpRequest>)+0x77c) [0x9f06cc]

       7: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa)
      [0x8391aa]

       8: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd)
      [0x88792d]

       9:
      (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)
      [0x88cee9]

       10:
      (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
      ThreadPool::TPHandle&)+0x66a) [0x82702a]

       11: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
      std::tr1::shared_ptr<OpRequest>,
      ThreadPool::TPHandle&)+0x3bd) [0x6961dd]

       12: (OSD::ShardedOpWQ::_process(unsigned int,
      ceph::heartbeat_handle_d*)+0x338) [0x696708]

       13: (ShardedThreadPool::shardedthreadpool_worker(unsigned
      int)+0x875) [0xb98555]

       14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
      [0xb9a670]

       15: (()+0x8182) [0x7f8e57f6c182]

       16: (clone()+0x6d) [0x7f8e564d747d]

       NOTE: a copy of the executable, or `objdump -rdS
      <executable>` is needed to interpret this.

      --- logging levels ---

         0/ 5 none

         0/ 1 lockdep

         0/ 1 context

         1/ 1 crush

         1/ 5 mds

         1/ 5 mds_balancer

         1/ 5 mds_locker

         1/ 5 mds_log

         1/ 5 mds_log_expire

         1/ 5 mds_migrator

         0/ 1 buffer

         0/ 1 timer

         0/ 1 filer

         0/ 1 striper

         0/ 1 objecter

         0/ 5 rados

         0/ 5 rbd

         0/ 5 rbd_replay

         0/ 5 journaler

         0/ 5 objectcacher

         0/ 5 client

         0/ 5 osd

         0/ 5 optracker

         0/ 5 objclass

         1/ 3 filestore

         1/ 3 keyvaluestore

         1/ 3 journal

         0/ 5 ms

         1/ 5 mon

         0/10 monc

         1/ 5 paxos

         0/ 5 tp

         1/ 5 auth

         1/ 5 crypto

         1/ 1 finisher

         1/ 5 heartbeatmap

         1/ 5 perfcounter

         1/ 5 rgw

         1/10 civetweb

         1/ 5 javaclient

         1/ 5 asok

         1/ 1 throttle

         0/ 0 refs

         1/ 5 xio

        -2/-2 (syslog threshold)

        -1/-1 (stderr threshold)

        max_recent     10000

        max_new         1000

        log_file /var/log/ceph/ceph-osd.55.log

      --- end dump of recent events ---

      2016-03-31 03:07:56.960104 7f8e3680f700 -1 *** Caught signal
      (Aborted) **

       in thread 7f8e3680f700

       ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

       1: /usr/bin/ceph-osd() [0xaaff6a]

       2: (()+0x10340) [0x7f8e57f74340]

       3: (gsignal()+0x39) [0x7f8e56413cc9]

       4: (abort()+0x148) [0x7f8e564170d8]

       5: (__gnu_cxx::__verbose_terminate_handler()+0x155)
      [0x7f8e56d1e535]

       6: (()+0x5e6d6) [0x7f8e56d1c6d6]

       7: (()+0x5e703) [0x7f8e56d1c703]

       8: (()+0x5e922) [0x7f8e56d1c922]

       9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
      const*)+0x278) [0xba8d78]

       10: (SnapMapper::add_oid(hobject_t const&,
      std::set<snapid_t, std::less<snapid_t>,
      std::allocator<snapid_t> > const&,
      MapCacher::Transaction<std::string,
      ceph::buffer::list>*)+0x61e) [0x72137e]

       11: (PG::update_snap_map(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      ObjectStore::Transaction&)+0x402) [0x7d25c2]

       12: (PG::append_log(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&, eversion_t,
      eversion_t, ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]

       13: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, eversion_t
      const&, eversion_t const&, bool,
      ObjectStore::Transaction*)+0xba) [0x899eca]

       14: (ReplicatedBackend::submit_transaction(hobject_t const&,
      eversion_t const&, PGBackend::PGTransaction*, eversion_t
      const&, eversion_t const&, std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, Context*,
      Context*, Context*, unsigned long, osd_reqid_t,
      std::tr1::shared_ptr<OpRequest>)+0x77c) [0x9f06cc]

       15: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa)
      [0x8391aa]

       16: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd)
      [0x88792d]

       17:
      (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)
      [0x88cee9]

       18:
      (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
      ThreadPool::TPHandle&)+0x66a) [0x82702a]

       19: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
      std::tr1::shared_ptr<OpRequest>,
      ThreadPool::TPHandle&)+0x3bd) [0x6961dd]

       20: (OSD::ShardedOpWQ::_process(unsigned int,
      ceph::heartbeat_handle_d*)+0x338) [0x696708]

       21: (ShardedThreadPool::shardedthreadpool_worker(unsigned
      int)+0x875) [0xb98555]

       22: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
      [0xb9a670]

       23: (()+0x8182) [0x7f8e57f6c182]

       24: (clone()+0x6d) [0x7f8e564d747d]

       NOTE: a copy of the executable, or `objdump -rdS
      <executable>` is needed to interpret this.

      --- begin dump of recent events ---

          -7> 2016-03-31 03:07:56.945486 7f8e43829700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 <== osd.54
      [2a00:c6c0:0:122::105]:6810/3568 7 ==== osd_map(122203..122203 src
      has 121489..122203) v3 ==== 222+0+0 (2966331141 0 0) 0x12200d00
      con 0x1209d4a0

          -6> 2016-03-31 03:07:56.945514 7f8e43829700  3 osd.55
      122203 handle_osd_map epochs [122203,122203], i have 122203, src
      has [121489,122203]

          -5> 2016-03-31 03:07:56.945517 7f8e2d6bd700  1 --
      [2a00:c6c0:0:122::105]:6822/11703 <== osd.54
      [2a00:c6c0:0:122::105]:6810/3568 8 ====
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0) v1 ==== 83+0+0 (4008969226 0 0) 0x11e661c0 con 0x1209d4a0

          -4> 2016-03-31 03:07:56.945538 7f8e2d6bd700  5 -- op
      tracker -- seq: 25, time: 2016-03-31 03:07:56.945488, event:
      header_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -3> 2016-03-31 03:07:56.945545 7f8e2d6bd700  5 -- op
      tracker -- seq: 25, time: 2016-03-31 03:07:56.945489, event:
      throttled, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -2> 2016-03-31 03:07:56.945549 7f8e2d6bd700  5 -- op
      tracker -- seq: 25, time: 2016-03-31 03:07:56.945512, event:
      all_read, op: osd_repop_reply(client.776466.1:190178605 3.117
      ondisk, result = 0)

          -1> 2016-03-31 03:07:56.945552 7f8e2d6bd700  5 -- op
      tracker -- seq: 25, time: 0.000000, event: dispatched, op:
      osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result =
      0)

           0> 2016-03-31 03:07:56.960104 7f8e3680f700 -1 *** Caught
      signal (Aborted) **

       in thread 7f8e3680f700

       ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

       1: /usr/bin/ceph-osd() [0xaaff6a]

       2: (()+0x10340) [0x7f8e57f74340]

       3: (gsignal()+0x39) [0x7f8e56413cc9]

       4: (abort()+0x148) [0x7f8e564170d8]

       5: (__gnu_cxx::__verbose_terminate_handler()+0x155)
      [0x7f8e56d1e535]

       6: (()+0x5e6d6) [0x7f8e56d1c6d6]

       7: (()+0x5e703) [0x7f8e56d1c703]

       8: (()+0x5e922) [0x7f8e56d1c922]

       9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
      const*)+0x278) [0xba8d78]

       10: (SnapMapper::add_oid(hobject_t const&,
      std::set<snapid_t, std::less<snapid_t>,
      std::allocator<snapid_t> > const&,
      MapCacher::Transaction<std::string,
      ceph::buffer::list>*)+0x61e) [0x72137e]

       11: (PG::update_snap_map(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      ObjectStore::Transaction&)+0x402) [0x7d25c2]

       12: (PG::append_log(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&, eversion_t,
      eversion_t, ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]

       13: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, eversion_t
      const&, eversion_t const&, bool,
      ObjectStore::Transaction*)+0xba) [0x899eca]

       14: (ReplicatedBackend::submit_transaction(hobject_t const&,
      eversion_t const&, PGBackend::PGTransaction*, eversion_t
      const&, eversion_t const&, std::vector<pg_log_entry_t,
      std::allocator<pg_log_entry_t> > const&,
      boost::optional<pg_hit_set_history_t>&, Context*,
      Context*, Context*, unsigned long, osd_reqid_t,
      std::tr1::shared_ptr<OpRequest>)+0x77c) [0x9f06cc]

       15: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa)
      [0x8391aa]

       16: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd)
      [0x88792d]

       17:
      (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)
      [0x88cee9]

       18:
      (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
      ThreadPool::TPHandle&)+0x66a) [0x82702a]

       19: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
      std::tr1::shared_ptr<OpRequest>,
      ThreadPool::TPHandle&)+0x3bd) [0x6961dd]

       20: (OSD::ShardedOpWQ::_process(unsigned int,
      ceph::heartbeat_handle_d*)+0x338) [0x696708]

       21: (ShardedThreadPool::shardedthreadpool_worker(unsigned
      int)+0x875) [0xb98555]

       22: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
      [0xb9a670]

       23: (()+0x8182) [0x7f8e57f6c182]

       24: (clone()+0x6d) [0x7f8e564d747d]

       NOTE: a copy of the executable, or `objdump -rdS
      <executable>` is needed to interpret this.

      --- logging levels ---

         0/ 5 none

         0/ 1 lockdep

         0/ 1 context

         1/ 1 crush

         1/ 5 mds

         1/ 5 mds_balancer

         1/ 5 mds_locker

         1/ 5 mds_log

         1/ 5 mds_log_expire

         1/ 5 mds_migrator

         0/ 1 buffer

         0/ 1 timer

         0/ 1 filer

         0/ 1 striper

         0/ 1 objecter

         0/ 5 rados

         0/ 5 rbd

         0/ 5 rbd_replay

         0/ 5 journaler

         0/ 5 objectcacher

         0/ 5 client

         0/ 5 osd

         0/ 5 optracker

         0/ 5 objclass

         1/ 3 filestore

         1/ 3 keyvaluestore

         1/ 3 journal

         0/ 5 ms

         1/ 5 mon

         0/10 monc

         1/ 5 paxos

         0/ 5 tp

         1/ 5 auth

         1/ 5 crypto

         1/ 1 finisher

         1/ 5 heartbeatmap

         1/ 5 perfcounter

         1/ 5 rgw

         1/10 civetweb

         1/ 5 javaclient

         1/ 5 asok

         1/ 1 throttle

         0/ 0 refs

         1/ 5 xio

        -2/-2 (syslog threshold)

        -1/-1 (stderr threshold)

        max_recent     10000

        max_new         1000

        log_file /var/log/ceph/ceph-osd.55.log

      --- end dump of recent events ---

      On 03/30/2016 11:36 PM, Mart van
        Santen wrote:

        Hi there,

With the help of a lot of people we were able to repair the PG and
restored service. We will get back on this later with a full report for
future reference.

Regards,

Mart

On 03/30/2016 08:30 PM, Wido den Hollander wrote:

          Hi,

I have an issue with a Ceph cluster which I can't resolve.

Due to OSD failure a PG is incomplete, but I can't query the PG to see what I
can do to fix it.

     health HEALTH_WARN
            1 pgs incomplete
            1 pgs stuck inactive
            1 pgs stuck unclean
            98 requests are blocked > 32 sec

$ ceph pg 3.117 query

That will hang for ever.

$ ceph pg dump_stuck

pg_stat	state	up	up_primary	acting	acting_primary
3.117	incomplete	[68,55,74]	68	[68,55,74]	68

The primary PG in this case is osd.68 . If I stop the OSD the PG query works,
but it says that bringing osd 68 back online will probably help.

The 98 requests which are blocked are also on osd.68 and they all say:

They all say:
- initiated
- reached_pg

The cluster is running Hammer 0.94.5 in this case.

>From what I know a OSD had a failing disk and was restarted a couple of times
while the disk gave errors. This caused the PG to become incomplete.

I've set debug osd to 20, but I can't really tell what is going wrong on osd.68
which causes it to stall this long.

Any idea what to do here to get this PG up and running again?

Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    -- 
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com