Re: Infinite degraded objects

Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> · Wed, 25 Oct 2017 22:13:10 +0200

    Hello, 

    I cannot tell what was
        the previous version since I used the one installed on ubuntu 15.04.
        Now 16.04. 

    But what I can tell is
        that I get errors from ceph osd and mon from time to time. The
        mon problems are scaring since I have to wipe the monitor and
        then reinstall a new one. I cannot really understand what's
        going on. I have never so many problems like after updating. 

    Should I open a bug
        report?

     ceph version 10.2.7
        (50e863e0f4bc8f4b9e31156de690d765af245185)

         1: (ceph::__ceph_assert_fail(char const*, char const*, int,
        char const*)+0x80) [0x55d5d510b250]

         2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
        ceph::buffer::list*)+0x642) [0x55d5d4ade2b2]

         3: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a]

         4: (OSD::init()+0x2026) [0x55d5d4a3ec46]

         5: (main()+0x2d6b) [0x55d5d49b193b]

         6: (__libc_start_main()+0xf0) [0x7f49d02e5830]

         7: (_start()+0x29) [0x55d5d49f28c9]

         NOTE: a copy of the executable, or `objdump -rdS
        <executable>` is needed to interpret this.

        --- logging levels ---

           0/ 5 none

           0/ 1 lockdep

           0/ 1 context

           1/ 1 crush

           1/ 5 mds

           1/ 5 mds_balancer

           1/ 5 mds_locker

           1/ 5 mds_log

           1/ 5 mds_log_expire

           1/ 5 mds_migrator

           0/ 1 buffer

           0/ 1 timer

           0/ 1 filer

           0/ 1 striper

           0/ 1 objecter

           0/ 5 rados

           0/ 5 rbd

           0/ 5 rbd_mirror

           0/ 5 rbd_replay

           0/ 5 journaler

           0/ 5 objectcacher

           0/ 5 client

           0/ 5 osd

           0/ 5 optracker

           0/ 5 objclass

           1/ 3 filestore

           1/ 3 journal

           0/ 5 ms

           1/ 5 mon

           0/10 monc

           1/ 5 paxos

           0/ 5 tp

           1/ 5 auth

           1/ 5 crypto

           1/ 1 finisher

           1/ 5 heartbeatmap

           1/ 5 perfcounter

           1/ 5 rgw

           1/10 civetweb

           1/ 5 javaclient

           1/ 5 asok

           1/ 1 throttle

           0/ 0 refs

           1/ 5 xio

           1/ 5 compressor

           1/ 5 newstore

           1/ 5 bluestore

           1/ 5 bluefs

           1/ 3 bdev

           1/ 5 kstore

           4/ 5 rocksdb

           4/ 5 leveldb

           1/ 5 kinetic

           1/ 5 fuse

          -2/-2 (syslog threshold)

          -1/-1 (stderr threshold)

          max_recent     10000

          max_new         1000

          log_file /var/log/ceph/ceph-osd.3.log

        --- end dump of recent events ---

        2017-10-25 22:09:58.778107 7f49d36958c0 -1 *** Caught signal
        (Aborted) **

         in thread 7f49d36958c0 thread_name:ceph-osd

         ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

         1: (()+0x9616ee) [0x55d5d500b6ee]

         2: (()+0x11390) [0x7f49d235e390]

         3: (gsignal()+0x38) [0x7f49d02fa428]

         4: (abort()+0x16a) [0x7f49d02fc02a]

         5: (ceph::__ceph_assert_fail(char const*, char const*, int,
        char const*)+0x26b) [0x55d5d510b43b]

         6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
        ceph::buffer::list*)+0x642) [0x55d5d4ade2b2]

         7: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a]

         8: (OSD::init()+0x2026) [0x55d5d4a3ec46]

         9: (main()+0x2d6b) [0x55d5d49b193b]

         10: (__libc_start_main()+0xf0) [0x7f49d02e5830]

         11: (_start()+0x29) [0x55d5d49f28c9]

         NOTE: a copy of the executable, or `objdump -rdS
        <executable>` is needed to interpret this.

        --- begin dump of recent events ---

             0> 2017-10-25 22:09:58.778107 7f49d36958c0 -1 *** Caught
        signal (Aborted) **

         in thread 7f49d36958c0 thread_name:ceph-osd

         ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

         1: (()+0x9616ee) [0x55d5d500b6ee]

         2: (()+0x11390) [0x7f49d235e390]

         3: (gsignal()+0x38) [0x7f49d02fa428]

         4: (abort()+0x16a) [0x7f49d02fc02a]

         5: (ceph::__ceph_assert_fail(char const*, char const*, int,
        char const*)+0x26b) [0x55d5d510b43b]

         6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
        ceph::buffer::list*)+0x642) [0x55d5d4ade2b2]

         7: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a]

         8: (OSD::init()+0x2026) [0x55d5d4a3ec46]

         9: (main()+0x2d6b) [0x55d5d49b193b]

         10: (__libc_start_main()+0xf0) [0x7f49d02e5830]

         11: (_start()+0x29) [0x55d5d49f28c9]

         NOTE: a copy of the executable, or `objdump -rdS
        <executable>` is needed to interpret this.

        --- logging levels ---

           0/ 5 none

           0/ 1 lockdep

           0/ 1 context

           1/ 1 crush

           1/ 5 mds

           1/ 5 mds_balancer

           1/ 5 mds_locker

           1/ 5 mds_log

           1/ 5 mds_log_expire

           1/ 5 mds_migrator

           0/ 1 buffer

           0/ 1 timer

           0/ 1 filer

           0/ 1 striper

           0/ 1 objecter

           0/ 5 rados

           0/ 5 rbd

           0/ 5 rbd_mirror

           0/ 5 rbd_replay

           0/ 5 journaler

           0/ 5 objectcacher

           0/ 5 client

           0/ 5 osd

           0/ 5 optracker

           0/ 5 objclass

           1/ 3 filestore

           1/ 3 journal

           0/ 5 ms

           1/ 5 mon

           0/10 monc

           1/ 5 paxos

           0/ 5 tp

           1/ 5 auth

           1/ 5 crypto

           1/ 1 finisher

           1/ 5 heartbeatmap

           1/ 5 perfcounter

           1/ 5 rgw

           1/10 civetweb

           1/ 5 javaclient

           1/ 5 asok

           1/ 1 throttle

           0/ 0 refs

           1/ 5 xio

           1/ 5 compressor

           1/ 5 newstore

           1/ 5 bluestore

           1/ 5 bluefs

           1/ 3 bdev

           1/ 5 kstore

           4/ 5 rocksdb

           4/ 5 leveldb

           1/ 5 kinetic

           1/ 5 fuse

          -2/-2 (syslog threshold)

          -1/-1 (stderr threshold)

          max_recent     10000

          max_new         1000

          log_file /var/log/ceph/ceph-osd.3.log

        -

    On 25/10/17 00:42, Christian Wuerdig
      wrote:

      >From which version of ceph to which other version of ceph did you
upgrade? Can you provide logs from crashing OSDs? The degraded object
percentage being larger than 100% has been reported before
(https://www.spinics.net/lists/ceph-users/msg39519.html) and looks
like it's been fixed a week or so ago:
http://tracker.ceph.com/issues/21803

On Mon, Oct 23, 2017 at 5:10 AM, Gonzalo Aguilar Delgado
<gaguilar@xxxxxxxxxxxxxxxxxx> wrote:

        Hello,

Since we upgraded ceph cluster we are facing a lot of problems. Most of them
due to osd crashing. What can cause this?

This morning I woke up with thi message:

root@red-compute:~# ceph -w
    cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771
     health HEALTH_ERR
            1 pgs are stuck inactive for more than 300 seconds
            7 pgs inconsistent
            1 pgs stale
            1 pgs stuck stale
            recovery 20266198323167232/287940 objects degraded
(7038340738753.641%)
            37154696925806626 scrub errors
            too many PGs per OSD (305 > max 300)
     monmap e12: 2 mons at
{blue-compute=172.16.0.119:6789/0,red-compute=172.16.0.100:6789/0}
            election epoch 4986, quorum 0,1 red-compute,blue-compute
      fsmap e913: 1/1/1 up {0=blue-compute=up:active}
     osdmap e8096: 5 osds: 5 up, 5 in
            flags require_jewel_osds
      pgmap v68755349: 764 pgs, 6 pools, 558 GB data, 140 kobjects
            1119 GB used, 3060 GB / 4179 GB avail
            20266198323167232/287940 objects degraded (7038340738753.641%)
                 756 active+clean
                   7 active+clean+inconsistent
                   1 stale+active+clean
  client io 1630 B/s rd, 552 kB/s wr, 0 op/s rd, 64 op/s wr

2017-10-22 18:10:13.000812 mon.0 [INF] pgmap v68755348: 764 pgs: 7
active+clean+inconsistent, 756 active+clean, 1 stale+active+clean; 558 GB
data, 1119 GB used, 3060 GB / 4179 GB avail; 1641 B/s rd, 229 kB/s wr, 39
op/s; 20266198323167232/287940 objects degraded (7038340738753.641%)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com