hi stefan ... i did the next step and need your help.
my idea was to stretch the cluster without stretch mode. so we decided
to reserve a size of 4 on each side.
the setup is the same as stretched mode, also crush rule, location,
election_strategy and tie breaker.
only "ceph mon enable_stretch_mode e stretch_rule datacenter" wasn't
made.
now in my test, i made a split brain and expect, that on the remaining
side, the cluster will rebuild the 4 replica.
but that did not happen.
actually, the cluster, is doing the same stuff, as stretch mode enabled.
writeable with 2 replica.
can you explain me why? i'm spinning around.
this is the status during split brain:
######################################
pve-test02-01:~# ceph -s
cluster:
id: 376fcdef-bba0-4e58-b63e-c9754dc948fa
health: HEALTH_WARN
6/13 mons down, quorum
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker
1 datacenter (8 osds) down
8 osds down
6 hosts (8 osds) down
Degraded data redundancy: 2116/4232 objects degraded
(50.000%), 95 pgs degraded, 113 pgs undersized
services:
mon: 13 daemons, quorum
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker
(age 54m), out of quorum: pve-test01-02, pve-test01-04, pve-test01-06,
pve-test02-02, pve-test02-04, pve-test02-06
mgr: pve-test02-05(active, since 53m), standbys: pve-test01-05,
pve-test01-01, pve-test01-03, pve-test02-01, pve-test02-03
mds: 1/1 daemons up, 1 standby
osd: 16 osds: 8 up (since 54m), 16 in (since 77m)
data:
volumes: 1/1 healthy
pools: 5 pools, 113 pgs
objects: 1.06k objects, 3.9 GiB
usage: 9.7 GiB used, 580 GiB / 590 GiB avail
pgs: 2116/4232 objects degraded (50.000%)
95 active+undersized+degraded
18 active+undersized
io:
client: 17 KiB/s wr, 0 op/s rd, 10 op/s wr
######################################
thanks a lot,
ronny
Am 2024-04-30 11:42, schrieb Stefan Kooman:
On 30-04-2024 11:22, ronny.lippold wrote:
hi stefan ... you are the hero of the month ;)
:p.
i don't know, why i did not found your bug report.
i have the exact same problem and resolved the HEALTH only with "ceph
osd force_healthy_stretch_mode --yes-i-really-mean-it"
will comment the report soon.
actually, we think about 4/2 size without stretch mode enable.
what was your solution?
This specific setup (on which I did the testing) is going to be full
flash (SSD). So the HDDs are going to be phased out. And only the
default non-device-class crush rule will be used. While that will work
for this (small) cluster, it is not a solution. This issue should be
fixed, as I figure there are quite a few cluster that want to use
device-classes and use stretch mode at the same time.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx