Re: Undersized pgs problem

Bob R <bobr@xxxxxxxxxxxxxx> · Mon, 30 Nov 2015 10:09:02 -0800

Vasiliy,
I don't think that's the cause. Can you paste other tuning options from your ceph.conf?

Also, have you fixed the problems with cephx auth?

Bob

On Mon, Nov 30, 2015 at 12:56 AM, Vasiliy Angapov <angapov@xxxxxxxxx> wrote:
Btw, in my configuration "mon osd downout subtree limit" is set to "host".

Does it influence things?

2015-11-29 14:38 GMT+08:00 Vasiliy Angapov <angapov@xxxxxxxxx>:

> Bob,

> Thanks for explanation, sounds resonable! But how it could happen that

> host is down and its OSDs are still IN cluster?

> I mean NOOUT flag is not set and my timeouts are fully default...

>

> But if I remember correctly host was not completely down, it was

> pingable but not other services were reachable like SSH or any others.

> Is it possible that OSDs were still sending some information to

> monitors making them look like IN?

>

> 2015-11-29 2:10 GMT+08:00 Bob R <bobr@xxxxxxxxxxxxxx>:

>> Vasiliy,

>>

>> Your OSDs are marked as 'down' but 'in'.

>>

>> "Ceph OSDs have two known states that can be combined. Up and Down only

>> tells you whether the OSD is actively involved in the cluster. OSD states

>> also are expressed in terms of cluster replication: In and Out. Only when a

>> Ceph OSD is tagged as Out does the self-healing process occur"

>>

>> Bob

>>

>> On Fri, Nov 27, 2015 at 6:15 AM, Mart van Santen <mart@xxxxxxxxxxxx> wrote:

>>>

>>>

>>> Dear Vasilily,

>>>

>>>

>>>

>>> On 11/27/2015 02:00 PM, Irek Fasikhov wrote:

>>>

>>> You have time to synchronize?

>>>

>>> С уважением, Фасихов Ирек Нургаязович

>>> Моб.: +79229045757

>>>

>>> 2015-11-27 15:57 GMT+03:00 Vasiliy Angapov <angapov@xxxxxxxxx>:

>>>>

>>>> > It seams that you played around with crushmap, and done something

>>>> > wrong.

>>>> > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd'

>>>> > devices renamed to 'device' think threre is you problem.

>>>> Is this a mistake actually? What I did is removed a bunch of OSDs from

>>>> my cluster that's why the numeration is sparse. But is it an issue to

>>>> a have a sparse numeration of OSDs?

>>>

>>>

>>> I think this is normal and should be no problem. I had this also

>>> previously.

>>>

>>>>

>>>> > Hi.

>>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:

>>>> > -3 14.56000     host slpeah001

>>>> > -2 14.56000     host slpeah002

>>>> What exactly is wrong here?

>>>

>>>

>>> I do not know how the weight of the hosts contribute to determine were to

>>> store the 3-th copy of the PG. As you explained, you have enough space on

>>> all hosts, but maybe if the weights of the hosts do not count up and the

>>> crushmap maybe come to the conclusion it is not able to place the PGs. What

>>> you can try, is to artificially raise the weights of these hosts, to see if

>>> it starts mapping the thirth copies for the pg's onto the available host.

>>>

>>> I had a similiar problem in the past, this was solved by upgrading to the

>>> latest crush tunables. But be aware, that can create massive datamovement

>>> behavior.

>>>

>>>

>>>>

>>>> I also found out that my OSD logs are full of such records:

>>>> 2015-11-26 08:31:19.273268 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:19.273276 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000

>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:24.273207 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:24.273225 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:24.273231 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000

>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:29.273199 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:29.273215 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:29.273222 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000

>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:34.273469 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:34.273482 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:34.273486 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000

>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:39.273310 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:39.273331 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:39.273342 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000

>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:44.273753 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:44.273769 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:44.273776 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000

>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:49.273412 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>> 2015-11-26 08:31:49.273431 7fe4f49b1700  0 cephx: verify_authorizer

>>>> could not get service secret for service osd secret_id=2924

>>>> 2015-11-26 08:31:49.273455 7fe4f49b1700  0 --

>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000

>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad

>>>> authorizer

>>>> 2015-11-26 08:31:54.273293 7fe4f49b1700  0 auth: could not find

>>>> secret_id=2924

>>>>

>>>> What does it mean? Google sais it might be a time sync issue, but my

>>>> clocks are perfectly synchronized...

>>>

>>>

>>> Normally you get an error warning in "ceph status" if time is out of sync.

>>> Nevertheless, you can try to restart the OSD's. I had issues with timing in

>>> the past and discovered it sometime helps to restart the daemons *after*

>>> syncing the times, before the accepted the new timings. But this was mostly

>>> the case with monitors though.

>>>

>>>

>>>

>>> Regards,

>>>

>>>

>>> Mart

>>>

>>>

>>>

>>>

>>>>

>>>> 2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malmyzh@xxxxxxxxx>:

>>>> > Hi.

>>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:

>>>> > " -3 14.56000     host slpeah001

>>>> >  -2 14.56000     host slpeah002

>>>> >  "

>>>> >

>>>> > С уважением, Фасихов Ирек Нургаязович

>>>> > Моб.: +79229045757

>>>> >

>>>> > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич

>>>> > <Kamil.Kuramshin@xxxxxxxx>:

>>>> >>

>>>> >> It seams that you played around with crushmap, and done something

>>>> >> wrong.

>>>> >> Compare the look of 'ceph osd tree' and crushmap. There are some 'osd'

>>>> >> devices renamed to 'device' think threre is you problem.

>>>> >>

>>>> >> Отправлено с мобильного устройства.

>>>> >>

>>>> >>

>>>> >> -----Original Message-----

>>>> >> From: Vasiliy Angapov <angapov@xxxxxxxxx>

>>>> >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>

>>>> >> Sent: чт, 26 нояб. 2015 7:53

>>>> >> Subject:  Undersized pgs problem

>>>> >>

>>>> >> Hi, colleagues!

>>>> >>

>>>> >> I have small 4-node CEPH cluster (0.94.2), all pools have size 3,

>>>> >> min_size

>>>> >> 1.

>>>> >> This night one host failed and cluster was unable to rebalance saying

>>>> >> there are a lot of undersized pgs.

>>>> >>

>>>> >> root@slpeah002:[~]:# ceph -s

>>>> >>     cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728

>>>> >>      health HEALTH_WARN

>>>> >>             1486 pgs degraded

>>>> >>             1486 pgs stuck degraded

>>>> >>             2257 pgs stuck unclean

>>>> >>             1486 pgs stuck undersized

>>>> >>             1486 pgs undersized

>>>> >>             recovery 80429/555185 objects degraded (14.487%)

>>>> >>             recovery 40079/555185 objects misplaced (7.219%)

>>>> >>             4/20 in osds are down

>>>> >>             1 mons down, quorum 1,2 slpeah002,slpeah007

>>>> >>      monmap e7: 3 mons at

>>>> >>

>>>> >>

>>>> >> {slpeah001=192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0}

>>>> >>             election epoch 710, quorum 1,2 slpeah002,slpeah007

>>>> >>      osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs

>>>> >>       pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects

>>>> >>             3366 GB used, 93471 GB / 96838 GB avail

>>>> >>             80429/555185 objects degraded (14.487%)

>>>> >>             40079/555185 objects misplaced (7.219%)

>>>> >>                 1903 active+clean

>>>> >>                 1486 active+undersized+degraded

>>>> >>                  771 active+remapped

>>>> >>   client io 0 B/s rd, 246 kB/s wr, 67 op/s

>>>> >>

>>>> >>   root@slpeah002:[~]:# ceph osd tree

>>>> >> ID  WEIGHT   TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY

>>>> >>  -1 94.63998 root default

>>>> >>  -9 32.75999     host slpeah007

>>>> >>  72  5.45999         osd.72          up  1.00000          1.00000

>>>> >>  73  5.45999         osd.73          up  1.00000          1.00000

>>>> >>  74  5.45999         osd.74          up  1.00000          1.00000

>>>> >>  75  5.45999         osd.75          up  1.00000          1.00000

>>>> >>  76  5.45999         osd.76          up  1.00000          1.00000

>>>> >>  77  5.45999         osd.77          up  1.00000          1.00000

>>>> >> -10 32.75999     host slpeah008

>>>> >>  78  5.45999         osd.78          up  1.00000          1.00000

>>>> >>  79  5.45999         osd.79          up  1.00000          1.00000

>>>> >>  80  5.45999         osd.80          up  1.00000          1.00000

>>>> >>  81  5.45999         osd.81          up  1.00000          1.00000

>>>> >>  82  5.45999         osd.82          up  1.00000          1.00000

>>>> >>  83  5.45999         osd.83          up  1.00000          1.00000

>>>> >>  -3 14.56000     host slpeah001

>>>> >>   1  3.64000          osd.1         down  1.00000          1.00000

>>>> >>  33  3.64000         osd.33        down  1.00000          1.00000

>>>> >>  34  3.64000         osd.34        down  1.00000          1.00000

>>>> >>  35  3.64000         osd.35        down  1.00000          1.00000

>>>> >>  -2 14.56000     host slpeah002

>>>> >>   0  3.64000         osd.0           up  1.00000          1.00000

>>>> >>  36  3.64000         osd.36          up  1.00000          1.00000

>>>> >>  37  3.64000         osd.37          up  1.00000          1.00000

>>>> >>  38  3.64000         osd.38          up  1.00000          1.00000

>>>> >>

>>>> >> Crushmap:

>>>> >>

>>>> >>  # begin crush map

>>>> >> tunable choose_local_tries 0

>>>> >> tunable choose_local_fallback_tries 0

>>>> >> tunable choose_total_tries 50

>>>> >> tunable chooseleaf_descend_once 1

>>>> >> tunable chooseleaf_vary_r 1

>>>> >> tunable straw_calc_version 1

>>>> >> tunable allowed_bucket_algs 54

>>>> >>

>>>> >> # devices

>>>> >> device 0 osd.0

>>>> >> device 1 osd.1

>>>> >> device 2 device2

>>>> >> device 3 device3

>>>> >> device 4 device4

>>>> >> device 5 device5

>>>> >> device 6 device6

>>>> >> device 7 device7

>>>> >> device 8 device8

>>>> >> device 9 device9

>>>> >> device 10 device10

>>>> >> device 11 device11

>>>> >> device 12 device12

>>>> >> device 13 device13

>>>> >> device 14 device14

>>>> >> device 15 device15

>>>> >> device 16 device16

>>>> >> device 17 device17

>>>> >> device 18 device18

>>>> >> device 19 device19

>>>> >> device 20 device20

>>>> >> device 21 device21

>>>> >> device 22 device22

>>>> >> device 23 device23

>>>> >> device 24 device24

>>>> >> device 25 device25

>>>> >> device 26 device26

>>>> >> device 27 device27

>>>> >> device 28 device28

>>>> >> device 29 device29

>>>> >> device 30 device30

>>>> >> device 31 device31

>>>> >> device 32 device32

>>>> >> device 33 osd.33

>>>> >> device 34 osd.34

>>>> >> device 35 osd.35

>>>> >> device 36 osd.36

>>>> >> device 37 osd.37

>>>> >> device 38 osd.38

>>>> >> device 39 device39

>>>> >> device 40 device40

>>>> >> device 41 device41

>>>> >> device 42 device42

>>>> >> device 43 device43

>>>> >> device 44 device44

>>>> >> device 45 device45

>>>> >> device 46 device46

>>>> >> device 47 device47

>>>> >> device 48 device48

>>>> >> device 49 device49

>>>> >> device 50 device50

>>>> >> device 51 device51

>>>> >> device 52 device52

>>>> >> device 53 device53

>>>> >> device 54 device54

>>>> >> device 55 device55

>>>> >> device 56 device56

>>>> >> device 57 device57

>>>> >> device 58 device58

>>>> >> device 59 device59

>>>> >> device 60 device60

>>>> >> device 61 device61

>>>> >> device 62 device62

>>>> >> device 63 device63

>>>> >> device 64 device64

>>>> >> device 65 device65

>>>> >> device 66 device66

>>>> >> device 67 device67

>>>> >> device 68 device68

>>>> >> device 69 device69

>>>> >> device 70 device70

>>>> >> device 71 device71

>>>> >> device 72 osd.72

>>>> >> device 73 osd.73

>>>> >> device 74 osd.74

>>>> >> device 75 osd.75

>>>> >> device 76 osd.76

>>>> >> device 77 osd.77

>>>> >> device 78 osd.78

>>>> >> device 79 osd.79

>>>> >> device 80 osd.80

>>>> >> device 81 osd.81

>>>> >> device 82 osd.82

>>>> >> device 83 osd.83

>>>> >>

>>>> >> # types

>>>> >> type 0 osd

>>>> >> type 1 host

>>>> >> type 2 chassis

>>>> >> type 3 rack

>>>> >> type 4 row

>>>> >> type 5 pdu

>>>> >> type 6 pod

>>>> >> type 7 room

>>>> >> type 8 datacenter

>>>> >> type 9 region

>>>> >> type 10 root

>>>> >>

>>>> >> # buckets

>>>> >> host slpeah007 {

>>>> >>         id -9           # do not change unnecessarily

>>>> >>         # weight 32.760

>>>> >>         alg straw

>>>> >>         hash 0  # rjenkins1

>>>> >>         item osd.72 weight 5.460

>>>> >>         item osd.73 weight 5.460

>>>> >>         item osd.74 weight 5.460

>>>> >>         item osd.75 weight 5.460

>>>> >>         item osd.76 weight 5.460

>>>> >>         item osd.77 weight 5.460

>>>> >> }

>>>> >> host slpeah008 {

>>>> >>         id -10          # do not change unnecessarily

>>>> >>         # weight 32.760

>>>> >>         alg straw

>>>> >>         hash 0  # rjenkins1

>>>> >>         item osd.78 weight 5.460

>>>> >>         item osd.79 weight 5.460

>>>> >>         item osd.80 weight 5.460

>>>> >>         item osd.81 weight 5.460

>>>> >>         item osd.82 weight 5.460

>>>> >>         item osd.83 weight 5.460

>>>> >> }

>>>> >> host slpeah001 {

>>>> >>         id -3           # do not change unnecessarily

>>>> >>         # weight 14.560

>>>> >>         alg straw

>>>> >>         hash 0  # rjenkins1

>>>> >>         item osd.1 weight 3.640

>>>> >>         item osd.33 weight 3.640

>>>> >>         item osd.34 weight 3.640

>>>> >>         item osd.35 weight 3.640

>>>> >> }

>>>> >> host slpeah002 {

>>>> >>         id -2           # do not change unnecessarily

>>>> >>         # weight 14.560

>>>> >>         alg straw

>>>> >>         hash 0  # rjenkins1

>>>> >>         item osd.0 weight 3.640

>>>> >>         item osd.36 weight 3.640

>>>> >>         item osd.37 weight 3.640

>>>> >>         item osd.38 weight 3.640

>>>> >> }

>>>> >> root default {

>>>> >>         id -1           # do not change unnecessarily

>>>> >>         # weight 94.640

>>>> >>         alg straw

>>>> >>         hash 0  # rjenkins1

>>>> >>         item slpeah007 weight 32.760

>>>> >>         item slpeah008 weight 32.760

>>>> >>         item slpeah001 weight 14.560

>>>> >>         item slpeah002 weight 14.560

>>>> >> }

>>>> >>

>>>> >> # rules

>>>> >> rule default {

>>>> >>         ruleset 0

>>>> >>         type replicated

>>>> >>         min_size 1

>>>> >>         max_size 10

>>>> >>         step take default

>>>> >>         step chooseleaf firstn 0 type host

>>>> >>         step emit

>>>> >> }

>>>> >>

>>>> >> # end crush map

>>>> >>

>>>> >>

>>>> >>

>>>> >> This is odd because pools have size 3 and I have 3 hosts alive, so why

>>>> >> it is saying that undersized pgs are present? It makes me feel like

>>>> >> CRUSH is not working properly.

>>>> >> There is not much data currently in cluster, something about 3TB and

>>>> >> as you can see from osd tree - each host have minimum of 14TB disk

>>>> >> space on OSDs.

>>>> >> So I'm a bit stuck now...

>>>> >> How can I find the source of trouble?

>>>> >>

>>>> >> Thanks in advance!

>>>> >> _______________________________________________

>>>> >> ceph-users mailing list

>>>> >> ceph-users@xxxxxxxxxxxxxx

>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>> >>

>>>> >> _______________________________________________

>>>> >> ceph-users mailing list

>>>> >> ceph-users@xxxxxxxxxxxxxx

>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>> >>

>>>> >

>>>

>>>

>>>

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

>>>

>>> --

>>> Mart van Santen

>>> Greenhost

>>> E: mart@xxxxxxxxxxxx

>>> T: +31 20 4890444

>>> W: https://greenhost.nl

>>>

>>> A PGP signature can be attached to this e-mail,

>>> you need PGP software to verify it.

>>> My public key is available in keyserver(s)

>>> see: http://tinyurl.com/openpgp-manual

>>>

>>> PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5

>>>

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com