Re: stretched cluster new pool and second pool with nvme

"ronny.lippold" <ceph@xxxxxxxxx> · Tue, 21 May 2024 13:38:39 +0200

hi stefan ... i did the next step and need your help.

my idea was to stretch the cluster without stretch mode. so we decided 
to reserve a size of 4 on each side.

the setup is the same as stretched mode, also crush rule, location, 
election_strategy and tie breaker.
only "ceph mon enable_stretch_mode e stretch_rule datacenter" wasn't 
made.

now in my test, i made a split brain and expect, that on the remaining 
side, the cluster will rebuild the 4 replica.
but that did not happen.
actually, the cluster, is doing the same stuff, as stretch mode enabled. 
writeable with 2 replica.

can you explain me why? i'm spinning around.

this is the status during split brain:

######################################
pve-test02-01:~# ceph -s
  cluster:
    id:     376fcdef-bba0-4e58-b63e-c9754dc948fa
    health: HEALTH_WARN
            6/13 mons down, quorum 
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker
            1 datacenter (8 osds) down
            8 osds down
            6 hosts (8 osds) down
            Degraded data redundancy: 2116/4232 objects degraded 
(50.000%), 95 pgs degraded, 113 pgs undersized

  services:
    mon: 13 daemons, quorum 
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker 
(age 54m), out of quorum: pve-test01-02, pve-test01-04, pve-test01-06, 
pve-test02-02, pve-test02-04, pve-test02-06
    mgr: pve-test02-05(active, since 53m), standbys: pve-test01-05, 
pve-test01-01, pve-test01-03, pve-test02-01, pve-test02-03
    mds: 1/1 daemons up, 1 standby
    osd: 16 osds: 8 up (since 54m), 16 in (since 77m)

  data:
    volumes: 1/1 healthy
    pools:   5 pools, 113 pgs
    objects: 1.06k objects, 3.9 GiB
    usage:   9.7 GiB used, 580 GiB / 590 GiB avail
    pgs:     2116/4232 objects degraded (50.000%)
             95 active+undersized+degraded
             18 active+undersized

  io:
    client:   17 KiB/s wr, 0 op/s rd, 10 op/s wr
######################################

thanks a lot,
ronny

Am 2024-04-30 11:42, schrieb Stefan Kooman:
On 30-04-2024 11:22, ronny.lippold wrote:
hi stefan ... you are the hero of the month ;)

:p.

i don't know, why i did not found your bug report.

i have the exact same problem and resolved the HEALTH only with "ceph 
osd force_healthy_stretch_mode --yes-i-really-mean-it"
will comment the report soon.

actually, we think about 4/2 size without stretch mode enable.

what was your solution?

This specific setup (on which I did the testing) is going to be full 
flash (SSD). So the HDDs are going to be phased out. And only the 
default non-device-class crush rule will be used. While that will work 
for this (small) cluster, it is not a solution. This issue should be 
fixed, as I figure there are quite a few cluster that want to use 
device-classes and use stretch mode at the same time.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx