Re: Undersized pgs problem

Mart van Santen <mart@xxxxxxxxxxxx> · Fri, 27 Nov 2015 15:15:32 +0100

    Dear Vasilily,

    On 11/27/2015 02:00 PM, Irek Fasikhov
      wrote:

      You have time to synchronize?

            С уважением, Фасихов Ирек Нургаязович
              Моб.: +79229045757

        2015-11-27 15:57 GMT+03:00 Vasiliy
          Angapov <angapov@xxxxxxxxx>:

          > It seams that you played around with
              crushmap, and done something wrong.

              > Compare the look of 'ceph osd tree' and crushmap.
              There are some 'osd' devices renamed to 'device' think
              threre is you problem.

            Is this a mistake actually? What I did is removed a
            bunch of OSDs from

            my cluster that's why the numeration is sparse. But is it an
            issue to

            a have a sparse numeration of OSDs?

    I think this is normal and should be no problem. I had this also
    previously. 

              > Hi.

              > Vasiliy, Yes it is a problem with crusmap. Look at
              height:

              > -3 14.56000     host slpeah001

              > -2 14.56000     host slpeah002

            What exactly is wrong here?

    I do not know how the weight of the hosts contribute to determine
    were to store the 3-th copy of the PG. As you explained, you have
    enough space on all hosts, but maybe if the weights of the hosts do
    not count up and the crushmap maybe come to the conclusion it is not
    able to place the PGs. What you can try, is to artificially raise
    the weights of these hosts, to see if it starts mapping the thirth
    copies for the pg's onto the available host.

    I had a similiar problem in the past, this was solved by upgrading
    to the latest crush tunables. But be aware, that can create massive
    datamovement behavior. 

            I also found out that my OSD logs are full of such records:

            2015-11-26 08:31:19.273268 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:19.273276 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x41fd1000

            sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad

            authorizer

            2015-11-26 08:31:24.273207 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:24.273225 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:24.273231 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x3f90b000

            sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad

            authorizer

            2015-11-26 08:31:29.273199 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:29.273215 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:29.273222 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x41fd1000

            sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad

            authorizer

            2015-11-26 08:31:34.273469 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:34.273482 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:34.273486 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x3f90b000

            sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad

            authorizer

            2015-11-26 08:31:39.273310 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:39.273331 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:39.273342 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x41fcc000

            sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad

            authorizer

            2015-11-26 08:31:44.273753 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:44.273769 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:44.273776 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x41fcc000

            sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad

            authorizer

            2015-11-26 08:31:49.273412 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            2015-11-26 08:31:49.273431 7fe4f49b1700  0 cephx:
            verify_authorizer

            could not get service secret for service osd secret_id=2924

            2015-11-26 08:31:49.273455 7fe4f49b1700  0 --

            192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
            pipe(0x41fd1000

            sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad

            authorizer

            2015-11-26 08:31:54.273293 7fe4f49b1700  0 auth: could not
            find secret_id=2924

            What does it mean? Google sais it might be a time sync
            issue, but my

            clocks are perfectly synchronized...

    Normally you get an error warning in "ceph status" if time is out of
    sync. Nevertheless, you can try to restart the OSD's. I had issues
    with timing in the past and discovered it sometime helps to restart
    the daemons *after* syncing the times, before the accepted the new
    timings. But this was mostly the case with monitors though.

    Regards,

    Mart

                2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malmyzh@xxxxxxxxx>:

                > Hi.

                > Vasiliy, Yes it is a problem with crusmap. Look at
                height:

                > " -3 14.56000     host slpeah001

                >  -2 14.56000     host slpeah002

                >  "

                >

                > С уважением, Фасихов Ирек Нургаязович

                > Моб.: +79229045757

                >

                > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль
                Фидаилевич

                > <Kamil.Kuramshin@xxxxxxxx>:

                >>

                >> It seams that you played around with crushmap,
                and done something wrong.

                >> Compare the look of 'ceph osd tree' and
                crushmap. There are some 'osd'

                >> devices renamed to 'device' think threre is you
                problem.

                >>

                >> Отправлено с мобильного устройства.

                >>

                >>

                >> -----Original Message-----

                >> From: Vasiliy Angapov <angapov@xxxxxxxxx>

                >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>

                >> Sent: чт, 26 нояб. 2015 7:53

                >> Subject:  Undersized pgs problem

                >>

                >> Hi, colleagues!

                >>

                >> I have small 4-node CEPH cluster (0.94.2), all
                pools have size 3, min_size

                >> 1.

                >> This night one host failed and cluster was
                unable to rebalance saying

                >> there are a lot of undersized pgs.

                >>

                >> root@slpeah002:[~]:# ceph -s

                >>     cluster
                78eef61a-3e9c-447c-a3ec-ce84c617d728

                >>      health HEALTH_WARN

                >>             1486 pgs degraded

                >>             1486 pgs stuck degraded

                >>             2257 pgs stuck unclean

                >>             1486 pgs stuck undersized

                >>             1486 pgs undersized

                >>             recovery 80429/555185 objects
                degraded (14.487%)

                >>             recovery 40079/555185 objects
                misplaced (7.219%)

                >>             4/20 in osds are down

                >>             1 mons down, quorum 1,2
                slpeah002,slpeah007

                >>      monmap e7: 3 mons at

                >>

                >> {slpeah001=192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0}

                >>             election epoch 710, quorum 1,2
                slpeah002,slpeah007

                >>      osdmap e14062: 20 osds: 16 up, 20 in; 771
                remapped pgs

                >>       pgmap v7021316: 4160 pgs, 5 pools, 1045
                GB data, 180 kobjects

                >>             3366 GB used, 93471 GB / 96838 GB
                avail

                >>             80429/555185 objects degraded
                (14.487%)

                >>             40079/555185 objects misplaced
                (7.219%)

                >>                 1903 active+clean

                >>                 1486 active+undersized+degraded

                >>                  771 active+remapped

                >>   client io 0 B/s rd, 246 kB/s wr, 67 op/s

                >>

                >>   root@slpeah002:[~]:# ceph osd tree

                >> ID  WEIGHT   TYPE NAME          UP/DOWN
                REWEIGHT PRIMARY-AFFINITY

                >>  -1 94.63998 root default

                >>  -9 32.75999     host slpeah007

                >>  72  5.45999         osd.72          up 
                1.00000          1.00000

                >>  73  5.45999         osd.73          up 
                1.00000          1.00000

                >>  74  5.45999         osd.74          up 
                1.00000          1.00000

                >>  75  5.45999         osd.75          up 
                1.00000          1.00000

                >>  76  5.45999         osd.76          up 
                1.00000          1.00000

                >>  77  5.45999         osd.77          up 
                1.00000          1.00000

                >> -10 32.75999     host slpeah008

                >>  78  5.45999         osd.78          up 
                1.00000          1.00000

                >>  79  5.45999         osd.79          up 
                1.00000          1.00000

                >>  80  5.45999         osd.80          up 
                1.00000          1.00000

                >>  81  5.45999         osd.81          up 
                1.00000          1.00000

                >>  82  5.45999         osd.82          up 
                1.00000          1.00000

                >>  83  5.45999         osd.83          up 
                1.00000          1.00000

                >>  -3 14.56000     host slpeah001

                >>   1  3.64000          osd.1         down 
                1.00000          1.00000

                >>  33  3.64000         osd.33        down 
                1.00000          1.00000

                >>  34  3.64000         osd.34        down 
                1.00000          1.00000

                >>  35  3.64000         osd.35        down 
                1.00000          1.00000

                >>  -2 14.56000     host slpeah002

                >>   0  3.64000         osd.0           up 
                1.00000          1.00000

                >>  36  3.64000         osd.36          up 
                1.00000          1.00000

                >>  37  3.64000         osd.37          up 
                1.00000          1.00000

                >>  38  3.64000         osd.38          up 
                1.00000          1.00000

                >>

                >> Crushmap:

                >>

                >>  # begin crush map

                >> tunable choose_local_tries 0

                >> tunable choose_local_fallback_tries 0

                >> tunable choose_total_tries 50

                >> tunable chooseleaf_descend_once 1

                >> tunable chooseleaf_vary_r 1

                >> tunable straw_calc_version 1

                >> tunable allowed_bucket_algs 54

                >>

                >> # devices

                >> device 0 osd.0

                >> device 1 osd.1

                >> device 2 device2

                >> device 3 device3

                >> device 4 device4

                >> device 5 device5

                >> device 6 device6

                >> device 7 device7

                >> device 8 device8

                >> device 9 device9

                >> device 10 device10

                >> device 11 device11

                >> device 12 device12

                >> device 13 device13

                >> device 14 device14

                >> device 15 device15

                >> device 16 device16

                >> device 17 device17

                >> device 18 device18

                >> device 19 device19

                >> device 20 device20

                >> device 21 device21

                >> device 22 device22

                >> device 23 device23

                >> device 24 device24

                >> device 25 device25

                >> device 26 device26

                >> device 27 device27

                >> device 28 device28

                >> device 29 device29

                >> device 30 device30

                >> device 31 device31

                >> device 32 device32

                >> device 33 osd.33

                >> device 34 osd.34

                >> device 35 osd.35

                >> device 36 osd.36

                >> device 37 osd.37

                >> device 38 osd.38

                >> device 39 device39

                >> device 40 device40

                >> device 41 device41

                >> device 42 device42

                >> device 43 device43

                >> device 44 device44

                >> device 45 device45

                >> device 46 device46

                >> device 47 device47

                >> device 48 device48

                >> device 49 device49

                >> device 50 device50

                >> device 51 device51

                >> device 52 device52

                >> device 53 device53

                >> device 54 device54

                >> device 55 device55

                >> device 56 device56

                >> device 57 device57

                >> device 58 device58

                >> device 59 device59

                >> device 60 device60

                >> device 61 device61

                >> device 62 device62

                >> device 63 device63

                >> device 64 device64

                >> device 65 device65

                >> device 66 device66

                >> device 67 device67

                >> device 68 device68

                >> device 69 device69

                >> device 70 device70

                >> device 71 device71

                >> device 72 osd.72

                >> device 73 osd.73

                >> device 74 osd.74

                >> device 75 osd.75

                >> device 76 osd.76

                >> device 77 osd.77

                >> device 78 osd.78

                >> device 79 osd.79

                >> device 80 osd.80

                >> device 81 osd.81

                >> device 82 osd.82

                >> device 83 osd.83

                >>

                >> # types

                >> type 0 osd

                >> type 1 host

                >> type 2 chassis

                >> type 3 rack

                >> type 4 row

                >> type 5 pdu

                >> type 6 pod

                >> type 7 room

                >> type 8 datacenter

                >> type 9 region

                >> type 10 root

                >>

                >> # buckets

                >> host slpeah007 {

                >>         id -9           # do not change
                unnecessarily

                >>         # weight 32.760

                >>         alg straw

                >>         hash 0  # rjenkins1

                >>         item osd.72 weight 5.460

                >>         item osd.73 weight 5.460

                >>         item osd.74 weight 5.460

                >>         item osd.75 weight 5.460

                >>         item osd.76 weight 5.460

                >>         item osd.77 weight 5.460

                >> }

                >> host slpeah008 {

                >>         id -10          # do not change
                unnecessarily

                >>         # weight 32.760

                >>         alg straw

                >>         hash 0  # rjenkins1

                >>         item osd.78 weight 5.460

                >>         item osd.79 weight 5.460

                >>         item osd.80 weight 5.460

                >>         item osd.81 weight 5.460

                >>         item osd.82 weight 5.460

                >>         item osd.83 weight 5.460

                >> }

                >> host slpeah001 {

                >>         id -3           # do not change
                unnecessarily

                >>         # weight 14.560

                >>         alg straw

                >>         hash 0  # rjenkins1

                >>         item osd.1 weight 3.640

                >>         item osd.33 weight 3.640

                >>         item osd.34 weight 3.640

                >>         item osd.35 weight 3.640

                >> }

                >> host slpeah002 {

                >>         id -2           # do not change
                unnecessarily

                >>         # weight 14.560

                >>         alg straw

                >>         hash 0  # rjenkins1

                >>         item osd.0 weight 3.640

                >>         item osd.36 weight 3.640

                >>         item osd.37 weight 3.640

                >>         item osd.38 weight 3.640

                >> }

                >> root default {

                >>         id -1           # do not change
                unnecessarily

                >>         # weight 94.640

                >>         alg straw

                >>         hash 0  # rjenkins1

                >>         item slpeah007 weight 32.760

                >>         item slpeah008 weight 32.760

                >>         item slpeah001 weight 14.560

                >>         item slpeah002 weight 14.560

                >> }

                >>

                >> # rules

                >> rule default {

                >>         ruleset 0

                >>         type replicated

                >>         min_size 1

                >>         max_size 10

                >>         step take default

                >>         step chooseleaf firstn 0 type host

                >>         step emit

                >> }

                >>

                >> # end crush map

                >>

                >>

                >>

                >> This is odd because pools have size 3 and I
                have 3 hosts alive, so why

                >> it is saying that undersized pgs are present?
                It makes me feel like

                >> CRUSH is not working properly.

                >> There is not much data currently in cluster,
                something about 3TB and

                >> as you can see from osd tree - each host have
                minimum of 14TB disk

                >> space on OSDs.

                >> So I'm a bit stuck now...

                >> How can I find the source of trouble?

                >>

                >> Thanks in advance!

                >> _______________________________________________

                >> ceph-users mailing list

                >> ceph-users@xxxxxxxxxxxxxx

                >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                >>

                >> _______________________________________________

                >> ceph-users mailing list

                >> ceph-users@xxxxxxxxxxxxxx

                >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                >>

                >

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    -- 
Mart van Santen
Greenhost
E: mart@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it. 
My public key is available in keyserver(s)
see: http://tinyurl.com/openpgp-manual

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com