Unexpected recovering after nautilus 14.2.7 -> 14.2.8

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Thu, 5 Mar 2020 08:58:14 +0100

Hello,

at the moment my ceph is still working but in a degraded state after I
upgraded one (in 9) hosts from 14.2.7 to 14.2.8 and rebooting this host
(node2, one  monitor in 3) after the upgrade.

Usually before rebooting I set

   ceph osd set noout
   ceph osd set nobackfill
   ceph osd set norecover

before rebooting, but I fogot this time. After having realized my error
I thought, ok I forgot to set the flags but I configured
mon_osd_down_out_interval to 900sec:

# ceph config get mon.mon_osd_down_out_interval
WHO    MASK LEVEL    OPTION                    VALUE RO
mon         advanced mon_osd_down_out_interval 900

The reboot took 5min so I expected nothing to happen. But it did and now
I do not understand why and if there are more timeout values I
could/should set to avoid this happening again if I ever should again
forget to set the noout , nobackfill, norecover flags prior to a reboot?

Thanks if anyone can explain to me what might have happened....
Rainer

The current ceph state is:
# ceph -s
  cluster:
    id:     xyz
    health: HEALTH_WARN
            Degraded data redundancy: 191629/76527549 objects degraded
(0.250%), 18 pgs degraded, 18 pgs undersized

  services:
    mon: 3 daemons, quorum node2,node5,node8 (age 51m)
    mgr: node5(active, since 53m), standbys: node8, node-admin, node2
    mds: mycephfs:1 {0=node3=up:active} 2 up:standby
    osd: 144 osds: 144 up (since 51m), 144 in (since 3M); 48 remapped pgs

  data:
    pools:   13 pools, 3460 pgs
    objects: 12.76M objects, 48 TiB
    usage:   95 TiB used, 429 TiB / 524 TiB avail
    pgs:     191629/76527549 objects degraded (0.250%)
             3098164/76527549 objects misplaced (4.048%)
             3412 active+clean
             30   active+remapped+backfill_wait
             13   active+undersized+degraded+remapped+backfill_wait
             5    active+undersized+degraded+remapped+backfilling

  io:
    client:   33 MiB/s rd, 7.2 MiB/s wr, 91 op/s rd, 186 op/s wr
    recovery: 83 MiB/s, 20 objects/s
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx