Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 9 Feb 2022 14:19:23 -0800

Speculation:  might the devicehealth pool be involved?  It seems to typically have just 1 PG.

> On Feb 9, 2022, at 1:41 PM, Zach Heise (SSCC) <heise@xxxxxxxxxxxx> wrote:
> 
> Good afternoon, thank you for your reply. Yes I know you are right, eventually we'll switch to an odd number of mons rather than even. We're still in 'testing' mode right now and only my coworkers and I are using the cluster.
> 
> Of the 7 pools, all but 2 are replica x3. The last two are EC 2+2.
> 
> Zach Heise
> 
> 
> On 2022-02-09 3:38 PM, sascha.arthur@xxxxxxxxx wrote:
>> Hello,
>> 
>> all your pools running replica > 1?
>> also having 4 monitors is pretty bad for split brain situations..
>> 
>> Zach Heise (SSCC) <heise@xxxxxxxxxxxx> schrieb am Mi., 9. Feb. 2022, 22:02:
>> 
>>    Hello,
>> 
>>    ceph health detail says my 5-node cluster is healthy, yet when I ran
>>    ceph orch upgrade start --ceph-version 16.2.7 everything seemed to go
>>    fine until we got to the OSD section, now for the past hour, every 15
>>    seconds a new log entry of  'Upgrade: unsafe to stop osd(s) at
>>    this time
>>    (1 PGs are or would become offline)' appears in the logs.
>> 
>>    ceph pg dump_stuck (unclean, degraded, etc) shows "ok" for everything
>>    too. Yet somehow 1 PG is (apparently) holding up all the OSD upgrades
>>    and not letting the process finish. Should I stop the upgrade and
>>    try it
>>    again? (I haven't done that before so was just nervous to try it).
>>    Any
>>    other ideas?
>> 
>>       cluster:
>>         id:     9aa000e8-b999-11eb-82f2-ecf4bbcc0ac0
>>         health: HEALTH_OK
>> 
>>       services:
>>         mon: 4 daemons, quorum ceph05,ceph04,ceph01,ceph03 (age 92m)
>>         mgr: ceph03.futetp(active, since 97m), standbys: ceph01.fblojp
>>         mds: 1/1 daemons up, 1 hot standby
>>         osd: 33 osds: 33 up (since 2h), 33 in (since 4h); 9 remapped pgs
>> 
>>       data:
>>         volumes: 1/1 healthy
>>         pools:   7 pools, 193 pgs
>>         objects: 3.72k objects, 14 GiB
>>         usage:   43 GiB used, 64 TiB / 64 TiB avail
>>         pgs:     231/11170 objects misplaced (2.068%)
>>                  185 active+clean
>>                  8   active+clean+remapped
>> 
>>       io:
>>         client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
>> 
>>       progress:
>>         Upgrade to 16.2.7 (5m)
>>           [=====.......................] (remaining: 24m)
>> 
>>    --     Zach
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx