Btw, in my configuration "mon osd downout subtree limit" is set to "host". Does it influence things? 2015-11-29 14:38 GMT+08:00 Vasiliy Angapov <angapov@xxxxxxxxx>: > Bob, > Thanks for explanation, sounds resonable! But how it could happen that > host is down and its OSDs are still IN cluster? > I mean NOOUT flag is not set and my timeouts are fully default... > > But if I remember correctly host was not completely down, it was > pingable but not other services were reachable like SSH or any others. > Is it possible that OSDs were still sending some information to > monitors making them look like IN? > > 2015-11-29 2:10 GMT+08:00 Bob R <bobr@xxxxxxxxxxxxxx>: >> Vasiliy, >> >> Your OSDs are marked as 'down' but 'in'. >> >> "Ceph OSDs have two known states that can be combined. Up and Down only >> tells you whether the OSD is actively involved in the cluster. OSD states >> also are expressed in terms of cluster replication: In and Out. Only when a >> Ceph OSD is tagged as Out does the self-healing process occur" >> >> Bob >> >> On Fri, Nov 27, 2015 at 6:15 AM, Mart van Santen <mart@xxxxxxxxxxxx> wrote: >>> >>> >>> Dear Vasilily, >>> >>> >>> >>> On 11/27/2015 02:00 PM, Irek Fasikhov wrote: >>> >>> You have time to synchronize? >>> >>> С уважением, Фасихов Ирек Нургаязович >>> Моб.: +79229045757 >>> >>> 2015-11-27 15:57 GMT+03:00 Vasiliy Angapov <angapov@xxxxxxxxx>: >>>> >>>> > It seams that you played around with crushmap, and done something >>>> > wrong. >>>> > Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' >>>> > devices renamed to 'device' think threre is you problem. >>>> Is this a mistake actually? What I did is removed a bunch of OSDs from >>>> my cluster that's why the numeration is sparse. But is it an issue to >>>> a have a sparse numeration of OSDs? >>> >>> >>> I think this is normal and should be no problem. I had this also >>> previously. >>> >>>> >>>> > Hi. >>>> > Vasiliy, Yes it is a problem with crusmap. Look at height: >>>> > -3 14.56000 host slpeah001 >>>> > -2 14.56000 host slpeah002 >>>> What exactly is wrong here? >>> >>> >>> I do not know how the weight of the hosts contribute to determine were to >>> store the 3-th copy of the PG. As you explained, you have enough space on >>> all hosts, but maybe if the weights of the hosts do not count up and the >>> crushmap maybe come to the conclusion it is not able to place the PGs. What >>> you can try, is to artificially raise the weights of these hosts, to see if >>> it starts mapping the thirth copies for the pg's onto the available host. >>> >>> I had a similiar problem in the past, this was solved by upgrading to the >>> latest crush tunables. But be aware, that can create massive datamovement >>> behavior. >>> >>> >>>> >>>> I also found out that my OSD logs are full of such records: >>>> 2015-11-26 08:31:19.273268 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:19.273276 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:24.273207 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:24.273225 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:24.273231 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000 >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:29.273199 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:29.273215 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:29.273222 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:34.273469 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:34.273482 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:34.273486 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x3f90b000 >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:39.273310 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:39.273331 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:39.273342 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000 >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:44.273753 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:44.273769 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:44.273776 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fcc000 >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:49.273412 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> 2015-11-26 08:31:49.273431 7fe4f49b1700 0 cephx: verify_authorizer >>>> could not get service secret for service osd secret_id=2924 >>>> 2015-11-26 08:31:49.273455 7fe4f49b1700 0 -- >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754 pipe(0x41fd1000 >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad >>>> authorizer >>>> 2015-11-26 08:31:54.273293 7fe4f49b1700 0 auth: could not find >>>> secret_id=2924 >>>> >>>> What does it mean? Google sais it might be a time sync issue, but my >>>> clocks are perfectly synchronized... >>> >>> >>> Normally you get an error warning in "ceph status" if time is out of sync. >>> Nevertheless, you can try to restart the OSD's. I had issues with timing in >>> the past and discovered it sometime helps to restart the daemons *after* >>> syncing the times, before the accepted the new timings. But this was mostly >>> the case with monitors though. >>> >>> >>> >>> Regards, >>> >>> >>> Mart >>> >>> >>> >>> >>>> >>>> 2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malmyzh@xxxxxxxxx>: >>>> > Hi. >>>> > Vasiliy, Yes it is a problem with crusmap. Look at height: >>>> > " -3 14.56000 host slpeah001 >>>> > -2 14.56000 host slpeah002 >>>> > " >>>> > >>>> > С уважением, Фасихов Ирек Нургаязович >>>> > Моб.: +79229045757 >>>> > >>>> > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич >>>> > <Kamil.Kuramshin@xxxxxxxx>: >>>> >> >>>> >> It seams that you played around with crushmap, and done something >>>> >> wrong. >>>> >> Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' >>>> >> devices renamed to 'device' think threre is you problem. >>>> >> >>>> >> Отправлено с мобильного устройства. >>>> >> >>>> >> >>>> >> -----Original Message----- >>>> >> From: Vasiliy Angapov <angapov@xxxxxxxxx> >>>> >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx> >>>> >> Sent: чт, 26 нояб. 2015 7:53 >>>> >> Subject: Undersized pgs problem >>>> >> >>>> >> Hi, colleagues! >>>> >> >>>> >> I have small 4-node CEPH cluster (0.94.2), all pools have size 3, >>>> >> min_size >>>> >> 1. >>>> >> This night one host failed and cluster was unable to rebalance saying >>>> >> there are a lot of undersized pgs. >>>> >> >>>> >> root@slpeah002:[~]:# ceph -s >>>> >> cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728 >>>> >> health HEALTH_WARN >>>> >> 1486 pgs degraded >>>> >> 1486 pgs stuck degraded >>>> >> 2257 pgs stuck unclean >>>> >> 1486 pgs stuck undersized >>>> >> 1486 pgs undersized >>>> >> recovery 80429/555185 objects degraded (14.487%) >>>> >> recovery 40079/555185 objects misplaced (7.219%) >>>> >> 4/20 in osds are down >>>> >> 1 mons down, quorum 1,2 slpeah002,slpeah007 >>>> >> monmap e7: 3 mons at >>>> >> >>>> >> >>>> >> {slpeah001=192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0} >>>> >> election epoch 710, quorum 1,2 slpeah002,slpeah007 >>>> >> osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs >>>> >> pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects >>>> >> 3366 GB used, 93471 GB / 96838 GB avail >>>> >> 80429/555185 objects degraded (14.487%) >>>> >> 40079/555185 objects misplaced (7.219%) >>>> >> 1903 active+clean >>>> >> 1486 active+undersized+degraded >>>> >> 771 active+remapped >>>> >> client io 0 B/s rd, 246 kB/s wr, 67 op/s >>>> >> >>>> >> root@slpeah002:[~]:# ceph osd tree >>>> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>>> >> -1 94.63998 root default >>>> >> -9 32.75999 host slpeah007 >>>> >> 72 5.45999 osd.72 up 1.00000 1.00000 >>>> >> 73 5.45999 osd.73 up 1.00000 1.00000 >>>> >> 74 5.45999 osd.74 up 1.00000 1.00000 >>>> >> 75 5.45999 osd.75 up 1.00000 1.00000 >>>> >> 76 5.45999 osd.76 up 1.00000 1.00000 >>>> >> 77 5.45999 osd.77 up 1.00000 1.00000 >>>> >> -10 32.75999 host slpeah008 >>>> >> 78 5.45999 osd.78 up 1.00000 1.00000 >>>> >> 79 5.45999 osd.79 up 1.00000 1.00000 >>>> >> 80 5.45999 osd.80 up 1.00000 1.00000 >>>> >> 81 5.45999 osd.81 up 1.00000 1.00000 >>>> >> 82 5.45999 osd.82 up 1.00000 1.00000 >>>> >> 83 5.45999 osd.83 up 1.00000 1.00000 >>>> >> -3 14.56000 host slpeah001 >>>> >> 1 3.64000 osd.1 down 1.00000 1.00000 >>>> >> 33 3.64000 osd.33 down 1.00000 1.00000 >>>> >> 34 3.64000 osd.34 down 1.00000 1.00000 >>>> >> 35 3.64000 osd.35 down 1.00000 1.00000 >>>> >> -2 14.56000 host slpeah002 >>>> >> 0 3.64000 osd.0 up 1.00000 1.00000 >>>> >> 36 3.64000 osd.36 up 1.00000 1.00000 >>>> >> 37 3.64000 osd.37 up 1.00000 1.00000 >>>> >> 38 3.64000 osd.38 up 1.00000 1.00000 >>>> >> >>>> >> Crushmap: >>>> >> >>>> >> # begin crush map >>>> >> tunable choose_local_tries 0 >>>> >> tunable choose_local_fallback_tries 0 >>>> >> tunable choose_total_tries 50 >>>> >> tunable chooseleaf_descend_once 1 >>>> >> tunable chooseleaf_vary_r 1 >>>> >> tunable straw_calc_version 1 >>>> >> tunable allowed_bucket_algs 54 >>>> >> >>>> >> # devices >>>> >> device 0 osd.0 >>>> >> device 1 osd.1 >>>> >> device 2 device2 >>>> >> device 3 device3 >>>> >> device 4 device4 >>>> >> device 5 device5 >>>> >> device 6 device6 >>>> >> device 7 device7 >>>> >> device 8 device8 >>>> >> device 9 device9 >>>> >> device 10 device10 >>>> >> device 11 device11 >>>> >> device 12 device12 >>>> >> device 13 device13 >>>> >> device 14 device14 >>>> >> device 15 device15 >>>> >> device 16 device16 >>>> >> device 17 device17 >>>> >> device 18 device18 >>>> >> device 19 device19 >>>> >> device 20 device20 >>>> >> device 21 device21 >>>> >> device 22 device22 >>>> >> device 23 device23 >>>> >> device 24 device24 >>>> >> device 25 device25 >>>> >> device 26 device26 >>>> >> device 27 device27 >>>> >> device 28 device28 >>>> >> device 29 device29 >>>> >> device 30 device30 >>>> >> device 31 device31 >>>> >> device 32 device32 >>>> >> device 33 osd.33 >>>> >> device 34 osd.34 >>>> >> device 35 osd.35 >>>> >> device 36 osd.36 >>>> >> device 37 osd.37 >>>> >> device 38 osd.38 >>>> >> device 39 device39 >>>> >> device 40 device40 >>>> >> device 41 device41 >>>> >> device 42 device42 >>>> >> device 43 device43 >>>> >> device 44 device44 >>>> >> device 45 device45 >>>> >> device 46 device46 >>>> >> device 47 device47 >>>> >> device 48 device48 >>>> >> device 49 device49 >>>> >> device 50 device50 >>>> >> device 51 device51 >>>> >> device 52 device52 >>>> >> device 53 device53 >>>> >> device 54 device54 >>>> >> device 55 device55 >>>> >> device 56 device56 >>>> >> device 57 device57 >>>> >> device 58 device58 >>>> >> device 59 device59 >>>> >> device 60 device60 >>>> >> device 61 device61 >>>> >> device 62 device62 >>>> >> device 63 device63 >>>> >> device 64 device64 >>>> >> device 65 device65 >>>> >> device 66 device66 >>>> >> device 67 device67 >>>> >> device 68 device68 >>>> >> device 69 device69 >>>> >> device 70 device70 >>>> >> device 71 device71 >>>> >> device 72 osd.72 >>>> >> device 73 osd.73 >>>> >> device 74 osd.74 >>>> >> device 75 osd.75 >>>> >> device 76 osd.76 >>>> >> device 77 osd.77 >>>> >> device 78 osd.78 >>>> >> device 79 osd.79 >>>> >> device 80 osd.80 >>>> >> device 81 osd.81 >>>> >> device 82 osd.82 >>>> >> device 83 osd.83 >>>> >> >>>> >> # types >>>> >> type 0 osd >>>> >> type 1 host >>>> >> type 2 chassis >>>> >> type 3 rack >>>> >> type 4 row >>>> >> type 5 pdu >>>> >> type 6 pod >>>> >> type 7 room >>>> >> type 8 datacenter >>>> >> type 9 region >>>> >> type 10 root >>>> >> >>>> >> # buckets >>>> >> host slpeah007 { >>>> >> id -9 # do not change unnecessarily >>>> >> # weight 32.760 >>>> >> alg straw >>>> >> hash 0 # rjenkins1 >>>> >> item osd.72 weight 5.460 >>>> >> item osd.73 weight 5.460 >>>> >> item osd.74 weight 5.460 >>>> >> item osd.75 weight 5.460 >>>> >> item osd.76 weight 5.460 >>>> >> item osd.77 weight 5.460 >>>> >> } >>>> >> host slpeah008 { >>>> >> id -10 # do not change unnecessarily >>>> >> # weight 32.760 >>>> >> alg straw >>>> >> hash 0 # rjenkins1 >>>> >> item osd.78 weight 5.460 >>>> >> item osd.79 weight 5.460 >>>> >> item osd.80 weight 5.460 >>>> >> item osd.81 weight 5.460 >>>> >> item osd.82 weight 5.460 >>>> >> item osd.83 weight 5.460 >>>> >> } >>>> >> host slpeah001 { >>>> >> id -3 # do not change unnecessarily >>>> >> # weight 14.560 >>>> >> alg straw >>>> >> hash 0 # rjenkins1 >>>> >> item osd.1 weight 3.640 >>>> >> item osd.33 weight 3.640 >>>> >> item osd.34 weight 3.640 >>>> >> item osd.35 weight 3.640 >>>> >> } >>>> >> host slpeah002 { >>>> >> id -2 # do not change unnecessarily >>>> >> # weight 14.560 >>>> >> alg straw >>>> >> hash 0 # rjenkins1 >>>> >> item osd.0 weight 3.640 >>>> >> item osd.36 weight 3.640 >>>> >> item osd.37 weight 3.640 >>>> >> item osd.38 weight 3.640 >>>> >> } >>>> >> root default { >>>> >> id -1 # do not change unnecessarily >>>> >> # weight 94.640 >>>> >> alg straw >>>> >> hash 0 # rjenkins1 >>>> >> item slpeah007 weight 32.760 >>>> >> item slpeah008 weight 32.760 >>>> >> item slpeah001 weight 14.560 >>>> >> item slpeah002 weight 14.560 >>>> >> } >>>> >> >>>> >> # rules >>>> >> rule default { >>>> >> ruleset 0 >>>> >> type replicated >>>> >> min_size 1 >>>> >> max_size 10 >>>> >> step take default >>>> >> step chooseleaf firstn 0 type host >>>> >> step emit >>>> >> } >>>> >> >>>> >> # end crush map >>>> >> >>>> >> >>>> >> >>>> >> This is odd because pools have size 3 and I have 3 hosts alive, so why >>>> >> it is saying that undersized pgs are present? It makes me feel like >>>> >> CRUSH is not working properly. >>>> >> There is not much data currently in cluster, something about 3TB and >>>> >> as you can see from osd tree - each host have minimum of 14TB disk >>>> >> space on OSDs. >>>> >> So I'm a bit stuck now... >>>> >> How can I find the source of trouble? >>>> >> >>>> >> Thanks in advance! >>>> >> _______________________________________________ >>>> >> ceph-users mailing list >>>> >> ceph-users@xxxxxxxxxxxxxx >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >>>> >> _______________________________________________ >>>> >> ceph-users mailing list >>>> >> ceph-users@xxxxxxxxxxxxxx >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >>>> > >>> >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> -- >>> Mart van Santen >>> Greenhost >>> E: mart@xxxxxxxxxxxx >>> T: +31 20 4890444 >>> W: https://greenhost.nl >>> >>> A PGP signature can be attached to this e-mail, >>> you need PGP software to verify it. >>> My public key is available in keyserver(s) >>> see: http://tinyurl.com/openpgp-manual >>> >>> PGP Fingerprint: CA85 EB11 2B70 042D AF66 B29A 6437 01A1 10A3 D3A5 >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com