Re: Ceph OSD crash starting up

David Turner <drakonstein@xxxxxxxxx> · Wed, 20 Sep 2017 17:23:34 +0000

My guess is that it's actually just your cluster finding the inconsistent PGs during its normal scrubbing schedule.  If a PG that was scrubbed and clean then becomes inconsistent, then yes I would look for a failing disk.  This could be fallout from the failing disk from before.  It could have been up just long enough before it crashed to cause problems.

On Wed, Sep 20, 2017 at 1:12 PM Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> wrote:

    Hi David, 

    Thank you for your
        support. What can be the cause of active+clean+inconsistent
        still growing up? Bad disk?
    Best regards,

    On 19/09/17 17:50, David Turner wrote:

      Adding the old OSD back in with its data shouldn't
        help you at all.  Your cluster has finished backfilling and has
        the proper amount of copies of all of its data.  The time you
        would want to add a removed OSD back to a cluster is when you
        have unfound objects.

        The scrub errors and inconsistent PGs are what you need to
          focus on and where your current problem is.  The message with
          too many PGs per OSD is just a warning and not causing any
          issues at this point as long as your OSD nodes aren't having
          any OOM messages.  Once you add in a 6th OSD, that will go
          away on its own.

        There are several threads on the Mailing List that you
          should be able to find about recovering from these and the
          potential dangers of some of the commands.  Googling for
          `ceph-users scrub errors inconsistent pgs` is a good place to
          start.

        On Tue, Sep 19, 2017 at 11:28 AM Gonzalo Aguilar
          Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
          wrote:

            Hi David, 

            What I want is
                to add the OSD back with its data yes. But avoiding any
                troubles that can happen from the time it was out. 

            Is it possible?
                I suppose that some pg has been updated after. Will ceph
                manage it gracefully?
            Ceph status is
                getting worse every day.

            ceph status

                    cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771

                     health HEALTH_ERR

                            6 pgs inconsistent

                            31 scrub errors

                            too many PGs per OSD (305 > max 300)

                     monmap e12: 2 mons at {blue-compute=172.16.0.119:6789/0,red-compute=172.16.0.100:6789/0}

                            election epoch 4328, quorum 0,1
                red-compute,blue-compute

                      fsmap e881: 1/1/1 up {0=blue-compute=up:active}

                     osdmap e7120: 5 osds: 5 up, 5 in

                            flags require_jewel_osds

                      pgmap v66976120: 764 pgs, 6 pools, 555 GB data,
                140 kobjects

                            1111 GB used, 3068 GB / 4179 GB avail

                                 758 active+clean

                                   6 active+clean+inconsistent

                  client io 384 kB/s wr, 0 op/s rd, 83 op/s wr

            I want to add
                the old OSD, rebalance copies are more hosts/osds and
                remove it out again. 

            Best regards,

            On
              19/09/17 14:47, David Turner wrote:

              Are you asking to add the osd back with its
                data or add it back in as a fresh osd.  What is your
                `ceph status`?

                On Tue, Sep 19, 2017, 5:23 AM Gonzalo
                  Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                  wrote:

                    Hi
                        David, 

                    Thank
                        you for the great explanation of the weights, I
                        thought that ceph was adjusting them based on
                        disk. But it seems it's not. 

                    But the
                        problem was not that I think the node was
                        failing because a software bug because the disk
                        was not full anymeans. 

                    /dev/sdb1                     976284608
                      172396756   803887852  18%
                      /var/lib/ceph/osd/ceph-1

                    Now the question is to know if I can add again
                      this osd safely. Is it possible?
                    Best regards,

                    On
                      14/09/17 23:29, David Turner wrote:

                      Your weights should more closely
                        represent the size of the OSDs.  OSD3 and OSD6
                        are weighted properly, but your other 3 OSDs
                        have the same weight even though OSD0 is twice
                        the size of OSD2 and OSD4.

                        Your OSD weights is what I thought you were
                          referring to when you said you set the crush
                          map to 1.  At some point it does look like you
                          set all of your OSD weights to 1, which would
                          apply to OSD1.  If the OSD was too small for
                          that much data, it would have filled up and be
                          too full to start.  Can you mount that disk
                          and see how much free space is on it?

                        Just so you understand what that weight is,
                          it is how much data the cluster is going to
                          put on it.  The default is for the weight to
                          be the size of the OSD in TiB (1024 based
                          instead of TB which is 1000).  If you set the
                          weight of a 1TB disk and a 4TB disk both to 1,
                          then the cluster will try and give them the
                          same amount of data.  If you set the 4TB disk
                          to a weight of 4, then the cluster will try to
                          give it 4x more data than the 1TB drive
                          (usually what you want).

                        In your case, your 926G OSD0 has a weight
                          of 1 and your 460G OSD2 has a weight of 1 so
                          the cluster thinks they should each receive
                          the same amount of data (which it did, they
                          each have ~275GB of data).  OSD3 has a weight
                          of 1.36380 (its size in TiB) and OSD6 has a
                          weight of 0.90919 and they have basically the
                          same %used space (17%) as opposed to the same
                          amount of data because the weight is based on
                          their size.

                        As long as you had enough replicas of your
                          data in the cluster for it to recover from you
                          removing OSD1 such that your cluster is
                          health_ok without any missing objects, then
                          there is nothing that you need off of OSD1 and
                          ceph recovered from the lost disk
                          successfully.

                        On Thu, Sep 14, 2017 at 4:39 PM
                          Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                          wrote:

                            Hello,

                            I
                                was on a old version of ceph. And it
                                showed a warning saying:
                            crush
                                  map has straw_calc_version=0
                            I
                                rode that adjusting it will only
                                rebalance all so admin should select
                                when to do it. So I went straigth and
                                ran:

                              ceph
                                  osd crush tunables optimal

                                It rebalanced as it said but then I
                                started to have lots of pg wrong. I
                                discovered that it was because my OSD1.
                                I thought it was disk faillure so I
                                added a new OSD6 and system started to
                                rebalance. Anyway OSD was not starting.
                            I
                                thought to wipe it all. But I preferred
                                to leave disk as it was, and journal
                                intact, in case I can recover and get
                                data from it. (See mail:
                                Scrub failing all the time, new
                                inconsistencies keep appearing). 

                                So here's the information. But it has
                                OSD1 replaced by OSD3, sorry. 

                            ID
                                WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE 
                                VAR  PGS 

                                 0 1.00000  1.00000  926G  271G  654G
                                29.34 1.10 369 

                                 2 1.00000  1.00000  460G  284G  176G
                                61.67 2.32 395 

                                 4 1.00000  1.00000  465G  151G  313G
                                32.64 1.23 214 

                                 3 1.36380  1.00000 1396G  239G 1157G
                                17.13 0.64 340 

                                 6 0.90919  1.00000  931G  164G  766G
                                17.70 0.67 210 

                                              TOTAL 4179G 1111G 3067G
                                26.60          

                                MIN/MAX VAR: 0.64/2.32  STDDEV: 16.99

                            As I said I still have OSD1 intact so I can
                            do whatever you need except readding to the
                            cluster. Since I don't know what It will do,
                            maybe cause havok.

                            Best regards,

                            On
                              14/09/17 17:12, David Turner wrote:

                              What do you mean by "updated
                                  crush map to 1"?  Can you please
                                  provide a copy of your crush map and
                                  `ceph osd df`?

                                On Wed, Sep 13, 2017 at
                                  6:39 AM Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                                  wrote:

                                    Hi, 

                                    I'recently updated
                                        crush map to 1 and did all
                                        relocation of the pgs. At the
                                        end I found that one of the OSD
                                        is not starting. 

                                    This is what it
                                        shows:

                                    2017-09-13 10:37:34.287248
                                      7f49cbe12700 -1 *** Caught signal
                                      (Aborted) **

                                       in thread 7f49cbe12700
                                      thread_name:filestore_sync

                                       ceph version 10.2.7
                                      (50e863e0f4bc8f4b9e31156de690d765af245185)

                                       1: (()+0x9616ee) [0xa93c6ef6ee]

                                       2: (()+0x11390) [0x7f49d9937390]

                                       3: (gsignal()+0x38)
                                      [0x7f49d78d3428]

                                       4: (abort()+0x16a)
                                      [0x7f49d78d502a]

                                       5: (ceph::__ceph_assert_fail(char
                                      const*, char const*, int, char
                                      const*)+0x26b) [0xa93c7ef43b]

                                       6:
                                      (FileStore::sync_entry()+0x2bbb)
                                      [0xa93c47fcbb]

                                       7:
                                      (FileStore::SyncThread::entry()+0xd)
                                      [0xa93c4adcdd]

                                       8: (()+0x76ba) [0x7f49d992d6ba]

                                       9: (clone()+0x6d)
                                      [0x7f49d79a53dd]

                                       NOTE: a copy of the executable,
                                      or `objdump -rdS
                                      <executable>` is needed to
                                      interpret this.

                                      --- begin dump of recent events
                                      ---

                                          -3> 2017-09-13
                                      10:37:34.253808 7f49dac6e8c0  5
                                      osd.1 pg_epoch: 6293 pg[10.8c( v
                                      6220'575937
                                      (4942'572901,6220'575937]
                                      local-les=6235 n=282 ec=419
                                      les/c/f 6235/6235/0
                                      6293/6293/6290) [1,2]/[2] r=-1
                                      lpr=0 pi=6234-6292/24
                                      crt=6220'575937 lcod 0'0 inactive
                                      NOTIFY NIBBLEWISE] exit Initial
                                      0.029683 0 0.000000

                                          -2> 2017-09-13
                                      10:37:34.253848 7f49dac6e8c0  5
                                      osd.1 pg_epoch: 6293 pg[10.8c( v
                                      6220'575937
                                      (4942'572901,6220'575937]
                                      local-les=6235 n=282 ec=419
                                      les/c/f 6235/6235/0
                                      6293/6293/6290) [1,2]/[2] r=-1
                                      lpr=0 pi=6234-6292/24
                                      crt=6220'575937 lcod 0'0 inactive
                                      NOTIFY NIBBLEWISE] enter Reset

                                          -1> 2017-09-13
                                      10:37:34.255018 7f49dac6e8c0  5
                                      osd.1 pg_epoch: 6293
                                      pg[10.90(unlocked)] enter Initial

                                           0> 2017-09-13
                                      10:37:34.287248 7f49cbe12700 -1
                                      *** Caught signal (Aborted) **

                                       in thread 7f49cbe12700
                                      thread_name:filestore_sync

                                       ceph version 10.2.7
                                      (50e863e0f4bc8f4b9e31156de690d765af245185)

                                       1: (()+0x9616ee) [0xa93c6ef6ee]

                                       2: (()+0x11390) [0x7f49d9937390]

                                       3: (gsignal()+0x38)
                                      [0x7f49d78d3428]

                                       4: (abort()+0x16a)
                                      [0x7f49d78d502a]

                                       5: (ceph::__ceph_assert_fail(char
                                      const*, char const*, int, char
                                      const*)+0x26b) [0xa93c7ef43b]

                                       6:
                                      (FileStore::sync_entry()+0x2bbb)
                                      [0xa93c47fcbb]

                                       7:
                                      (FileStore::SyncThread::entry()+0xd)
                                      [0xa93c4adcdd]

                                       8: (()+0x76ba) [0x7f49d992d6ba]

                                       9: (clone()+0x6d)
                                      [0x7f49d79a53dd]

                                       NOTE: a copy of the executable,
                                      or `objdump -rdS
                                      <executable>` is needed to
                                      interpret this.

                                      --- logging levels ---

                                         0/ 5 none

                                         0/ 1 lockdep

                                         0/ 1 context

                                         1/ 1 crush

                                         1/ 5 mds

                                         1/ 5 mds_balancer

                                         1/ 5 mds_locker

                                         1/ 5 mds_log

                                         1/ 5 mds_log_expire

                                         1/ 5 mds_migrator

                                         0/ 1 buffer

                                         0/ 1 timer

                                         0/ 1 filer

                                         0/ 1 striper

                                         0/ 1 objecter

                                         0/ 5 rados

                                         0/ 5 rbd

                                         0/ 5 rbd_mirror

                                         0/ 5 rbd_replay

                                         0/ 5 journaler

                                         0/ 5 objectcacher

                                         0/ 5 client

                                         0/ 5 osd

                                         0/ 5 optracker

                                         0/ 5 objclass

                                         1/ 3 filestore

                                         1/ 3 journal

                                         0/ 5 ms

                                         1/ 5 mon

                                         0/10 monc

                                         1/ 5 paxos

                                         0/ 5 tp

                                         1/ 5 auth

                                         1/ 5 crypto

                                         1/ 1 finisher

                                         1/ 5 heartbeatmap

                                         1/ 5 perfcounter

                                         1/ 5 rgw

                                         1/10 civetweb

                                         1/ 5 javaclient

                                         1/ 5 asok

                                         1/ 1 throttle

                                         0/ 0 refs

                                         1/ 5 xio

                                         1/ 5 compressor

                                         1/ 5 newstore

                                         1/ 5 bluestore

                                         1/ 5 bluefs

                                         1/ 3 bdev

                                         1/ 5 kstore

                                         4/ 5 rocksdb

                                         4/ 5 leveldb

                                         1/ 5 kinetic

                                         1/ 5 fuse

                                        -2/-2 (syslog threshold)

                                        -1/-1 (stderr threshold)

                                        max_recent     10000

                                        max_new         1000

                                        log_file
                                      /var/log/ceph/ceph-osd.1.log

                                      --- end dump of recent events ---

                                    Is there any way to recover it or
                                      should I open a bug?

                                    Best regards

_______________________________________________

                                  ceph-users mailing list

                                  ceph-users@xxxxxxxxxxxxxx

                                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com