Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Thu, 5 Mar 2020 12:58:27 +0100

Ok this seems to makes sense.

At the moment the cluster is still busy hnadling misplaced objects, but
when its done, I will set autoscale to "warn"
and also set the no...-Flags and then try to upgrade the next monitor
and see if this works smoother.

Thank you very much for yout help. I learned a lot following your proposals.

Rainer

Am 05.03.20 um 11:45 schrieb Dan van der Ster:
> Ahh that's it! You have `autoscale_mode on` for the pool, and in
> 14.2.8 there was a fix to calculating how many PGs are needed in an
> erasure coded pool:
> 
> https://github.com/ceph/ceph/commit/0253205ef36acc6759a3a9687c5eb1b27aa901bf
> 
> So at the moment your PGs are merging.
> 
> If you want to stop that change, then set autoscale_mode to off or
> warn for the relevant pools, then set the pg_num back to the current
> (1024).
> 
> -- Dan
> 
> On Thu, Mar 5, 2020 at 11:19 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
>>
>> The difference was not a big one and consists in a change in pgp_num for
>> a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map
>> (31856) :
>>
>> # diff 31853.txt 31856.txt
>> 1c1
>> < epoch 31853
>> ---
>>> epoch 31856
>> 4c4
>> < modified 2020-03-04 14:41:52.079327
>> ---
>>> modified 2020-03-05 07:24:39.938326
>> 24c24
>> < pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31852
>> lfor 0/21889/21905 flags hashpspool,ec_overwrites,selfmanaged_snaps
>> stripe_width 16384 target_size_ratio 0.15 application rbd
>> ---
>>> pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
>> rjenkins pg_num 1024 pgp_num 999 pg_num_target 256 pgp_num_target 256
>> autoscale_mode on last_change 31856 lfor 0/21889/21905 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>> target_size_ratio 0.15 application rbd
>> 28c28
>> < pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31659
>> lfor 0/28686/28688 flags hashpspool,ec_overwrites,selfmanaged_snaps
>> stripe_width 16384 target_size_ratio 0.15 application rbd
>> ---
>>> pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 pg_num_target 256 pgp_num_target 256
>> autoscale_mode on last_change 31856 lfor 0/28686/28688 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>> target_size_ratio 0.15 application rbd
>> 181d180
>> < blacklist 141.26.152.64:0/3433151139 expires 2020-03-04 15:16:25.964333
>>
>> Rainer
>>
>> Am 05.03.20 um 10:19 schrieb Dan van der Ster:
>>> Hi,
>>>
>>> There was movement already before you rebooted the node at 07:24:41.598004.
>>> That tells me that it was a ceph-mon process that restarted and either
>>> trimmed some upmaps or something similar.
>>>
>>> You can do this to see exactly what changed:
>>>
>>> # ceph osd getmap -o 31853 31853   # this is a guess -- pick an osdmap
>>> epoch that was just before you upgraded.
>>> # ceph osd getmap -o 31856 31856
>>> # diff <(osdmaptool --print 31853) <(osdmaptool --print 31856)
>>>
>>> -- dan
>>>
>>>
>>>
>>> On Thu, Mar 5, 2020 at 10:05 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
>>>>
>>>> Hello,
>>>>
>>>> before I ran the update to 14.2.8 I checked that the state was healthy
>>>> with all OSDs up and in. I still have the command history I typed
>>>> visible in my kde terminal buffer and there I see that after the update
>>>> but before the reboot I ran a ceph -s and there were 144 osd's up and in
>>>> the state was HEALTH_OK.
>>>>
>>>> Could it be of interest that the node rebooted was a monitor node and
>>>> should mon_osd_down_out_interval at least in theory have prevented what
>>>> happened to my cluster?
>>>>
>>>> Thanks
>>>> Rainer
>>>>
>>>> Am 05.03.20 um 09:49 schrieb Dan van der Ster:
>>>>> Did you have `144 total, 144 up, 144 in` also before the upgrade?
>>>>> If an osd was out, then you upgraded/restarted and it went back in, it
>>>>> would trigger data movement.
>>>>> (I usually set noin before an upgrade).
>>>>>
>>>>> -- dan
>>>>>
>>>>> On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> I found some information in ceph.log that might help to find out what
>>>>>> happened. node2  was the one I rebooted:
>>>>>>
>>>>>> 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
>>>>>> scrub starts
>>>>>> 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
>>>>>> scrub ok
>>>>>> 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
>>>>>> osdmap e31855: 144 total, 144 up, 144 in
>>>>>> 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
>>>>>> osdmap e31856: 144 total, 144 up, 144 in
>>>>>> 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
>>>>>> Health check failed: 1 pools have many more objects per pg than average
>>>>>> (MANY_OBJECTS_PER_PG)
>>>>>> 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
>>>>>> osdmap e31857: 144 total, 144 up, 144 in
>>>>>> 2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
>>>>>> osdmap e31858: 144 total, 144 up, 144 in
>>>>>> 2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
>>>>>> 36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
>>>>>> 2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
>>>>>> starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
>>>>>> 2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:41.634572 osd.5 (osd.5) 352 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.71(4) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.641651 osd.86 (osd.86) 460 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.90(0) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:41.644983 osd.5 (osd.5) 353 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.122(5) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.649661 osd.86 (osd.86) 461 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.118(2) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:41.652407 osd.5 (osd.5) 354 : cluster [DBG] 36.3f2s0
>>>>>> starting backfill to osd.131(2) from (0'0,0'0] MAX to 31854'331157
>>>>>> 2020-03-05 07:24:41.659823 osd.86 (osd.86) 462 : cluster [DBG] 36.3ees0
>>>>>> starting backfill to osd.139(1) from (0'0,0'0] MAX to 31854'511090
>>>>>> 2020-03-05 07:24:42.055680 mon.node2 (mon.0) 692729 : cluster [INF]
>>>>>> osd.23 marked itself down
>>>>>> 2020-03-05 07:24:42.055765 mon.node2 (mon.0) 692730 : cluster [INF]
>>>>>> osd.18 marked itself down
>>>>>> 2020-03-05 07:24:42.055919 mon.node2 (mon.0) 692731 : cluster [INF]
>>>>>> osd.21 marked itself down
>>>>>> 2020-03-05 07:24:42.056002 mon.node2 (mon.0) 692732 : cluster [INF]
>>>>>> osd.24 marked itself down
>>>>>> 2020-03-05 07:24:42.056250 mon.node2 (mon.0) 692733 : cluster [INF]
>>>>>> osd.17 marked itself down
>>>>>> 2020-03-05 07:24:42.058049 mon.node2 (mon.0) 692734 : cluster [INF]
>>>>>> osd.16 marked itself down
>>>>>> 2020-03-05 07:24:42.064002 mon.node2 (mon.0) 692735 : cluster [INF]
>>>>>> osd.31 marked itself down
>>>>>> 2020-03-05 07:24:42.069635 mon.node2 (mon.0) 692736 : cluster [INF]
>>>>>> osd.26 marked itself down
>>>>>> 2020-03-05 07:24:42.075325 mon.node2 (mon.0) 692737 : cluster [INF]
>>>>>> osd.29 marked itself down
>>>>>> 2020-03-05 07:24:42.080842 mon.node2 (mon.0) 692738 : cluster [INF]
>>>>>> osd.19 marked itself down
>>>>>> 2020-03-05 07:24:42.086368 mon.node2 (mon.0) 692739 : cluster [INF]
>>>>>> osd.22 marked itself down
>>>>>> 2020-03-05 07:24:42.091810 mon.node2 (mon.0) 692740 : cluster [INF]
>>>>>> osd.28 marked itself down
>>>>>> 2020-03-05 07:24:42.125240 mon.node2 (mon.0) 692741 : cluster [INF]
>>>>>> osd.30 marked itself down
>>>>>> 2020-03-05 07:24:42.125318 mon.node2 (mon.0) 692742 : cluster [INF]
>>>>>> osd.27 marked itself down
>>>>>> 2020-03-05 07:24:42.177279 mon.node2 (mon.0) 692743 : cluster [INF]
>>>>>> osd.20 marked itself down
>>>>>> 2020-03-05 07:24:42.189747 mon.node2 (mon.0) 692744 : cluster [INF]
>>>>>> osd.25 marked itself down
>>>>>> 2020-03-05 07:24:42.567690 mon.node2 (mon.0) 692745 : cluster [WRN]
>>>>>> Health check failed: 16 osds down (OSD_DOWN)
>>>>>> 2020-03-05 07:24:42.567743 mon.node2 (mon.0) 692746 : cluster [WRN]
>>>>>> Health check failed: 1 host (16 osds) down (OSD_HOST_DOWN)
>>>>>> 2020-03-05 07:24:42.673270 mon.node2 (mon.0) 692747 : cluster [DBG]
>>>>>> osdmap e31859: 144 total, 128 up, 144 in
>>>>>> 2020-03-05 07:24:41.577509 osd.122 (osd.122) 633 : cluster [DBG]
>>>>>> 36.3f0s0 starting backfill to osd.15(0) from (0'0,0'0] MAX to 31854'314030
>>>>>> 2020-03-05 07:24:41.588537 osd.94 (osd.94) 501 : cluster [DBG] 36.3eas0
>>>>>> starting backfill to osd.0(0) from (0'0,0'0] MAX to 31854'307657
>>>>>> 2020-03-05 07:24:41.593430 osd.114 (osd.114) 633 : cluster [DBG]
>>>>>> 36.3f3s0 starting backfill to osd.4(3) from (0'0,0'0] MAX to 31854'313629
>>>>>> 2020-03-05 07:24:41.593977 osd.122 (osd.122) 634 : cluster [DBG]
>>>>>> 36.3f0s0 starting backfill to osd.25(3) from (0'0,0'0] MAX to 31854'314030
>>>>>> 2020-03-05 07:24:41.595369 osd.126 (osd.126) 559 : cluster [DBG]
>>>>>> 36.3e7s0 starting backfill to osd.7(3) from (0'0,0'0] MAX to 31854'275181
>>>>>> 2020-03-05 07:24:41.598564 osd.85 (osd.85) 473 : cluster [DBG] 36.3f5s0
>>>>>> starting backfill to osd.3(5) from (0'0,0'0] MAX to 31854'313436
>>>>>> ....
>>>>>>
>>>>>> Am 05.03.20 um 08:58 schrieb Rainer Krienke:
>>>>>>> Hello,
>>>>>>>
>>>>>>> at the moment my ceph is still working but in a degraded state after I
>>>>>>> upgraded one (in 9) hosts from 14.2.7 to 14.2.8 and rebooting this host
>>>>>>> (node2, one  monitor in 3) after the upgrade.
>>>>>>>
>>>>>>> Usually before rebooting I set
>>>>>>>
>>>>>>>    ceph osd set noout
>>>>>>>    ceph osd set nobackfill
>>>>>>>    ceph osd set norecover
>>>>>>>
>>>>>>> before rebooting, but I fogot this time. After having realized my error
>>>>>>> I thought, ok I forgot to set the flags but I configured
>>>>>>> mon_osd_down_out_interval to 900sec:
>>>>>>>
>>>>>>> # ceph config get mon.mon_osd_down_out_interval
>>>>>>> WHO    MASK LEVEL    OPTION                    VALUE RO
>>>>>>> mon         advanced mon_osd_down_out_interval 900
>>>>>>>
>>>>>>> The reboot took 5min so I expected nothing to happen. But it did and now
>>>>>>> I do not understand why and if there are more timeout values I
>>>>>>> could/should set to avoid this happening again if I ever should again
>>>>>>> forget to set the noout , nobackfill, norecover flags prior to a reboot?
>>>>>>
>>>>>> --
>>>>>> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
>>>>>> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
>>>>>> Web: http://userpages.uni-koblenz.de/~krienke
>>>>>> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
>>>> --
>>>> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
>>>> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
>>>> Web: http://userpages.uni-koblenz.de/~krienke
>>>> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
>>
>>
>> --
>> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
>> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
>> Web: http://userpages.uni-koblenz.de/~krienke
>> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html

-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx