Vasiliy,
I don't think that's the cause. Can you paste other tuning options from your ceph.conf?
Also, have you fixed the problems with cephx auth?
Bob
On Mon, Nov 30, 2015 at 12:56 AM, Vasiliy Angapov <angapov@xxxxxxxxx> wrote:
Btw, in my configuration "mon osd downout subtree limit" is set to "host".
Does it influence things?
2015-11-29 14:38 GMT+08:00 Vasiliy Angapov <angapov@xxxxxxxxx>:
> Bob,
> Thanks for explanation, sounds resonable! But how it could happen that
> host is down and its OSDs are still IN cluster?
> I mean NOOUT flag is not set and my timeouts are fully default...
>
> But if I remember correctly host was not completely down, it was
> pingable but not other services were reachable like SSH or any others.
> Is it possible that OSDs were still sending some information to
> monitors making them look like IN?
>
> 2015-11-29 2:10 GMT+08:00 Bob R <bobr@xxxxxxxxxxxxxx>:
>> Vasiliy,
>>
>> Your OSDs are marked as 'down' but 'in'.
>>
>> "Ceph OSDs have two known states that can be combined. Up and Down only
>> tells you whether the OSD is actively involved in the cluster. OSD states
>> also are expressed in terms of cluster replication: In and Out. Only when a
>> Ceph OSD is tagged as Out does the self-healing process occur"
>>
>> Bob
>>
>> On Fri, Nov 27, 2015 at 6:15 AM, Mart van Santen <mart@xxxxxxxxxxxx> wrote:
>>>
>>>
>>> Dear Vasilily,
>>>
>>>
>>>
>>> On 11/27/2015 02:00 PM, Irek Fasikhov wrote:
>>>
>>> You have time to synchronize?
>>>
>>> С уважением, Фасихов Ирек Нургаязович
>>> Моб.: +79229045757
>>>
>>> 2015-11-27 15:57 GMT+03:00 Vasiliy Angapov <angapov@xxxxxxxxx>:
>>>>
>>>> > It seams that you played around with crushmap, and done something
>>>> > wrong.
>>>> > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd'
>>>> > devices renamed to 'device' think threre is you problem.
>>>> Is this a mistake actually? What I did is removed a bunch of OSDs from
>>>> my cluster that's why the numeration is sparse. But is it an issue to
>>>> a have a sparse numeration of OSDs?
>>>
>>>
>>> I think this is normal and should be no problem. I had this also
>>> previously.
>>>
>>>>
>>>> > Hi.
>>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:
>>>> > -3 14.56000 host slpeah001
>>>> > -2 14.56000 host slpeah002
>>>> What exactly is wrong here?
>>>
>>>
>>> I do not know how the weight of the hosts contribute to determine were to
>>> store the 3-th copy of the PG. As you explained, you have enough space on
>>> all hosts, but maybe if the weights of the hosts do not count up and the
>>> crushmap maybe come to the conclusion it is not able to place the PGs. What
>>> you can try, is to artificially raise the weights of these hosts, to see if
>>> it starts mapping the thirth copies for the pg's onto the available host.
>>>
>>> I had a similiar problem in the past, this was solved by upgrading to the
>>> latest crush tunables. But be aware, that can create massive datamovement
>>> behavior.
>>>
>>>
>>>>
>>>> I also found out that my OSD logs are full of such records:
>>>> 2015-11-26 08:31:19.273268 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:19.273276 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000
>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:24.273207 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:24.273225 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:24.273231 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000
>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:29.273199 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:29.273215 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:29.273222 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000
>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:34.273469 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:34.273482 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:34.273486 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000
>>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:39.273310 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:39.273331 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:39.273342 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000
>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:44.273753 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:44.273769 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:44.273776 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000
>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:49.273412 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>> 2015-11-26 08:31:49.273431 7fe4f49b1700 0 cephx: verify_authorizer
>>>> could not get service secret for service osd secret_id=2924
>>>> 2015-11-26 08:31:49.273455 7fe4f49b1700 0 --
>>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000
>>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad
>>>> authorizer
>>>> 2015-11-26 08:31:54.273293 7fe4f49b1700 0 auth: could not find
>>>> secret_id=2924
>>>>
>>>> What does it mean? Google sais it might be a time sync issue, but my
>>>> clocks are perfectly synchronized...
>>>
>>>
>>> Normally you get an error warning in "ceph status" if time is out of sync.
>>> Nevertheless, you can try to restart the OSD's. I had issues with timing in
>>> the past and discovered it sometime helps to restart the daemons *after*
>>> syncing the times, before the accepted the new timings. But this was mostly
>>> the case with monitors though.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>> Mart
>>>
>>>
>>>
>>>
>>>>
>>>> 2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malmyzh@xxxxxxxxx>:
>>>> > Hi.
>>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:
>>>> > " -3 14.56000 host slpeah001
>>>> > -2 14.56000 host slpeah002
>>>> > "
>>>> >
>>>> > С уважением, Фасихов Ирек Нургаязович
>>>> > Моб.: +79229045757
>>>> >
>>>> > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич
>>>> > <Kamil.Kuramshin@xxxxxxxx>:
>>>> >>
>>>> >> It seams that you played around with crushmap, and done something
>>>> >> wrong.
>>>> >> Compare the look of 'ceph osd tree' and crushmap. There are some 'osd'
>>>> >> devices renamed to 'device' think threre is you problem.
>>>> >>
>>>> >> Отправлено с мобильного устройства.
>>>> >>
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Vasiliy Angapov <angapov@xxxxxxxxx>
>>>> >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
>>>> >> Sent: чт, 26 нояб. 2015 7:53
>>>> >> Subject: Undersized pgs problem
>>>> >>
>>>> >> Hi, colleagues!
>>>> >>
>>>> >> I have small 4-node CEPH cluster (0.94.2), all pools have size 3,
>>>> >> min_size
>>>> >> 1.
>>>> >> This night one host failed and cluster was unable to rebalance saying
>>>> >> there are a lot of undersized pgs.
>>>> >>
>>>> >> root@slpeah002:[~]:# ceph -s
>>>> >> cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728
>>>> >> health HEALTH_WARN
>>>> >> 1486 pgs degraded
>>>> >> 1486 pgs stuck degraded
>>>> >> 2257 pgs stuck unclean
>>>> >> 1486 pgs stuck undersized
>>>> >> 1486 pgs undersized
>>>> >> recovery 80429/555185 objects degraded (14.487%)
>>>> >> recovery 40079/555185 objects misplaced (7.219%)
>>>> >> 4/20 in osds are down
>>>> >> 1 mons down, quorum 1,2 slpeah002,slpeah007
>>>> >> monmap e7: 3 mons at
>>>> >>
>>>> >>
>>>> >> {slpeah001=192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0}
>>>> >> election epoch 710, quorum 1,2 slpeah002,slpeah007
>>>> >> osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs
>>>> >> pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects
>>>> >> 3366 GB used, 93471 GB / 96838 GB avail
>>>> >> 80429/555185 objects degraded (14.487%)
>>>> >> 40079/555185 objects misplaced (7.219%)
>>>> >> 1903 active+clean
>>>> >> 1486 active+undersized+degraded
>>>> >> 771 active+remapped
>>>> >> client io 0 B/s rd, 246 kB/s wr, 67 op/s
>>>> >>
>>>> >> root@slpeah002:[~]:# ceph osd tree
>>>> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>>> >> -1 94.63998 root default
>>>> >> -9 32.75999 host slpeah007
>>>> >> 72 5.45999 osd.72 up 1.00000 1.00000
>>>> >> 73 5.45999 osd.73 up 1.00000 1.00000
>>>> >> 74 5.45999 osd.74 up 1.00000 1.00000
>>>> >> 75 5.45999 osd.75 up 1.00000 1.00000
>>>> >> 76 5.45999 osd.76 up 1.00000 1.00000
>>>> >> 77 5.45999 osd.77 up 1.00000 1.00000
>>>> >> -10 32.75999 host slpeah008
>>>> >> 78 5.45999 osd.78 up 1.00000 1.00000
>>>> >> 79 5.45999 osd.79 up 1.00000 1.00000
>>>> >> 80 5.45999 osd.80 up 1.00000 1.00000
>>>> >> 81 5.45999 osd.81 up 1.00000 1.00000
>>>> >> 82 5.45999 osd.82 up 1.00000 1.00000
>>>> >> 83 5.45999 osd.83 up 1.00000 1.00000
>>>> >> -3 14.56000 host slpeah001
>>>> >> 1 3.64000 osd.1 down 1.00000 1.00000
>>>> >> 33 3.64000 osd.33 down 1.00000 1.00000
>>>> >> 34 3.64000 osd.34 down 1.00000 1.00000
>>>> >> 35 3.64000 osd.35 down 1.00000 1.00000
>>>> >> -2 14.56000 host slpeah002
>>>> >> 0 3.64000 osd.0 up 1.00000 1.00000
>>>> >> 36 3.64000 osd.36 up 1.00000 1.00000
>>>> >> 37 3.64000 osd.37 up 1.00000 1.00000
>>>> >> 38 3.64000 osd.38 up 1.00000 1.00000
>>>> >>
>>>> >> Crushmap:
>>>> >>
>>>> >> # begin crush map
>>>> >> tunable choose_local_tries 0
>>>> >> tunable choose_local_fallback_tries 0
>>>> >> tunable choose_total_tries 50
>>>> >> tunable chooseleaf_descend_once 1
>>>> >> tunable chooseleaf_vary_r 1
>>>> >> tunable straw_calc_version 1
>>>> >> tunable allowed_bucket_algs 54
>>>> >>
>>>> >> # devices
>>>> >> device 0 osd.0
>>>> >> device 1 osd.1
>>>> >> device 2 device2
>>>> >> device 3 device3
>>>> >> device 4 device4
>>>> >> device 5 device5
>>>> >> device 6 device6
>>>> >> device 7 device7
>>>> >> device 8 device8
>>>> >> device 9 device9
>>>> >> device 10 device10
>>>> >> device 11 device11
>>>> >> device 12 device12
>>>> >> device 13 device13
>>>> >> device 14 device14
>>>> >> device 15 device15
>>>> >> device 16 device16
>>>> >> device 17 device17
>>>> >> device 18 device18
>>>> >> device 19 device19
>>>> >> device 20 device20
>>>> >> device 21 device21
>>>> >> device 22 device22
>>>> >> device 23 device23
>>>> >> device 24 device24
>>>> >> device 25 device25
>>>> >> device 26 device26
>>>> >> device 27 device27
>>>> >> device 28 device28
>>>> >> device 29 device29
>>>> >> device 30 device30
>>>> >> device 31 device31
>>>> >> device 32 device32
>>>> >> device 33 osd.33
>>>> >> device 34 osd.34
>>>> >> device 35 osd.35
>>>> >> device 36 osd.36
>>>> >> device 37 osd.37
>>>> >> device 38 osd.38
>>>> >> device 39 device39
>>>> >> device 40 device40
>>>> >> device 41 device41
>>>> >> device 42 device42
>>>> >> device 43 device43
>>>> >> device 44 device44
>>>> >> device 45 device45
>>>> >> device 46 device46
>>>> >> device 47 device47
>>>> >> device 48 device48
>>>> >> device 49 device49
>>>> >> device 50 device50
>>>> >> device 51 device51
>>>> >> device 52 device52
>>>> >> device 53 device53
>>>> >> device 54 device54
>>>> >> device 55 device55
>>>> >> device 56 device56
>>>> >> device 57 device57
>>>> >> device 58 device58
>>>> >> device 59 device59
>>>> >> device 60 device60
>>>> >> device 61 device61
>>>> >> device 62 device62
>>>> >> device 63 device63
>>>> >> device 64 device64
>>>> >> device 65 device65
>>>> >> device 66 device66
>>>> >> device 67 device67
>>>> >> device 68 device68
>>>> >> device 69 device69
>>>> >> device 70 device70
>>>> >> device 71 device71
>>>> >> device 72 osd.72
>>>> >> device 73 osd.73
>>>> >> device 74 osd.74
>>>> >> device 75 osd.75
>>>> >> device 76 osd.76
>>>> >> device 77 osd.77
>>>> >> device 78 osd.78
>>>> >> device 79 osd.79
>>>> >> device 80 osd.80
>>>> >> device 81 osd.81
>>>> >> device 82 osd.82
>>>> >> device 83 osd.83
>>>> >>
>>>> >> # types
>>>> >> type 0 osd
>>>> >> type 1 host
>>>> >> type 2 chassis
>>>> >> type 3 rack
>>>> >> type 4 row
>>>> >> type 5 pdu
>>>> >> type 6 pod
>>>> >> type 7 room
>>>> >> type 8 datacenter
>>>> >> type 9 region
>>>> >> type 10 root
>>>> >>
>>>> >> # buckets
>>>> >> host slpeah007 {
>>>> >> id -9 # do not change unnecessarily
>>>> >> # weight 32.760
>>>> >> alg straw
>>>> >> hash 0 # rjenkins1
>>>> >> item osd.72 weight 5.460
>>>> >> item osd.73 weight 5.460
>>>> >> item osd.74 weight 5.460
>>>> >> item osd.75 weight 5.460
>>>> >> item osd.76 weight 5.460
>>>> >> item osd.77 weight 5.460
>>>> >> }
>>>> >> host slpeah008 {
>>>> >> id -10 # do not change unnecessarily
>>>> >> # weight 32.760
>>>> >> alg straw
>>>> >> hash 0 # rjenkins1
>>>> >> item osd.78 weight 5.460
>>>> >> item osd.79 weight 5.460
>>>> >> item osd.80 weight 5.460
>>>> >> item osd.81 weight 5.460
>>>> >> item osd.82 weight 5.460
>>>> >> item osd.83 weight 5.460
>>>> >> }
>>>> >> host slpeah001 {
>>>> >> id -3 # do not change unnecessarily
>>>> >> # weight 14.560
>>>> >> alg straw
>>>> >> hash 0 # rjenkins1
>>>> >> item osd.1 weight 3.640
>>>> >> item osd.33 weight 3.640
>>>> >> item osd.34 weight 3.640
>>>> >> item osd.35 weight 3.640
>>>> >> }
>>>> >> host slpeah002 {
>>>> >> id -2 # do not change unnecessarily
>>>> >> # weight 14.560
>>>> >> alg straw
>>>> >> hash 0 # rjenkins1
>>>> >> item osd.0 weight 3.640
>>>> >> item osd.36 weight 3.640
>>>> >> item osd.37 weight 3.640
>>>> >> item osd.38 weight 3.640
>>>> >> }
>>>> >> root default {
>>>> >> id -1 # do not change unnecessarily
>>>> >> # weight 94.640
>>>> >> alg straw
>>>> >> hash 0 # rjenkins1
>>>> >> item slpeah007 weight 32.760
>>>> >> item slpeah008 weight 32.760
>>>> >> item slpeah001 weight 14.560
>>>> >> item slpeah002 weight 14.560
>>>> >> }
>>>> >>
>>>> >> # rules
>>>> >> rule default {
>>>> >> ruleset 0
>>>> >> type replicated
>>>> >> min_size 1
>>>> >> max_size 10
>>>> >> step take default
>>>> >> step chooseleaf firstn 0 type host
>>>> >> step emit
>>>> >> }
>>>> >>
>>>> >> # end crush map
>>>> >>
>>>> >>
>>>> >>
>>>> >> This is odd because pools have size 3 and I have 3 hosts alive, so why
>>>> >> it is saying that undersized pgs are present? It makes me feel like
>>>> >> CRUSH is not working properly.
>>>> >> There is not much data currently in cluster, something about 3TB and
>>>> >> as you can see from osd tree - each host have minimum of 14TB disk
>>>> >> space on OSDs.
>>>> >> So I'm a bit stuck now...
>>>> >> How can I find the source of trouble?
>>>> >>
>>>> >> Thanks in advance!
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users@xxxxxxxxxxxxxx
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users@xxxxxxxxxxxxxx
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>
>>>> >
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> --
>>> Mart van Santen
>>> Greenhost
>>> E: mart@xxxxxxxxxxxx
>>> T: +31 20 4890444
>>> W: https://greenhost.nl
>>>
>>> A PGP signature can be attached to this e-mail,
>>> you need PGP software to verify it.
>>> My public key is available in keyserver(s)
>>> see: http://tinyurl.com/openpgp-manual
>>>
>>> PGP Fingerprint: CA85 EB11 2B70 042D AF66 B29A 6437 01A1 10A3 D3A5
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com