Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 5 Mar 2020 09:01:41 +0100

Hi,

The ceph.log from when you upgraded should give some clues.
Are you using upmap balancing? Maybe this is just just further
refinement of the balancing.

-- dan

On Thu, Mar 5, 2020 at 8:58 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> at the moment my ceph is still working but in a degraded state after I
> upgraded one (in 9) hosts from 14.2.7 to 14.2.8 and rebooting this host
> (node2, one  monitor in 3) after the upgrade.
>
> Usually before rebooting I set
>
>    ceph osd set noout
>    ceph osd set nobackfill
>    ceph osd set norecover
>
> before rebooting, but I fogot this time. After having realized my error
> I thought, ok I forgot to set the flags but I configured
> mon_osd_down_out_interval to 900sec:
>
> # ceph config get mon.mon_osd_down_out_interval
> WHO    MASK LEVEL    OPTION                    VALUE RO
> mon         advanced mon_osd_down_out_interval 900
>
> The reboot took 5min so I expected nothing to happen. But it did and now
> I do not understand why and if there are more timeout values I
> could/should set to avoid this happening again if I ever should again
> forget to set the noout , nobackfill, norecover flags prior to a reboot?
>
>
> Thanks if anyone can explain to me what might have happened....
> Rainer
>
>
>
> The current ceph state is:
> # ceph -s
>   cluster:
>     id:     xyz
>     health: HEALTH_WARN
>             Degraded data redundancy: 191629/76527549 objects degraded
> (0.250%), 18 pgs degraded, 18 pgs undersized
>
>   services:
>     mon: 3 daemons, quorum node2,node5,node8 (age 51m)
>     mgr: node5(active, since 53m), standbys: node8, node-admin, node2
>     mds: mycephfs:1 {0=node3=up:active} 2 up:standby
>     osd: 144 osds: 144 up (since 51m), 144 in (since 3M); 48 remapped pgs
>
>   data:
>     pools:   13 pools, 3460 pgs
>     objects: 12.76M objects, 48 TiB
>     usage:   95 TiB used, 429 TiB / 524 TiB avail
>     pgs:     191629/76527549 objects degraded (0.250%)
>              3098164/76527549 objects misplaced (4.048%)
>              3412 active+clean
>              30   active+remapped+backfill_wait
>              13   active+undersized+degraded+remapped+backfill_wait
>              5    active+undersized+degraded+remapped+backfilling
>
>   io:
>     client:   33 MiB/s rd, 7.2 MiB/s wr, 91 op/s rd, 186 op/s wr
>     recovery: 83 MiB/s, 20 objects/s
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx