Re: stretched cluster new pool and second pool with nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi stefan ... i did the next step and need your help.

my idea was to stretch the cluster without stretch mode. so we decided to reserve a size of 4 on each side.

the setup is the same as stretched mode, also crush rule, location, election_strategy and tie breaker. only "ceph mon enable_stretch_mode e stretch_rule datacenter" wasn't made.

now in my test, i made a split brain and expect, that on the remaining side, the cluster will rebuild the 4 replica.
but that did not happen.
actually, the cluster, is doing the same stuff, as stretch mode enabled. writeable with 2 replica.

can you explain me why? i'm spinning around.

this is the status during split brain:

######################################
pve-test02-01:~# ceph -s
  cluster:
    id:     376fcdef-bba0-4e58-b63e-c9754dc948fa
    health: HEALTH_WARN
6/13 mons down, quorum pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker
            1 datacenter (8 osds) down
            8 osds down
            6 hosts (8 osds) down
Degraded data redundancy: 2116/4232 objects degraded (50.000%), 95 pgs degraded, 113 pgs undersized

  services:
mon: 13 daemons, quorum pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker (age 54m), out of quorum: pve-test01-02, pve-test01-04, pve-test01-06, pve-test02-02, pve-test02-04, pve-test02-06 mgr: pve-test02-05(active, since 53m), standbys: pve-test01-05, pve-test01-01, pve-test01-03, pve-test02-01, pve-test02-03
    mds: 1/1 daemons up, 1 standby
    osd: 16 osds: 8 up (since 54m), 16 in (since 77m)

  data:
    volumes: 1/1 healthy
    pools:   5 pools, 113 pgs
    objects: 1.06k objects, 3.9 GiB
    usage:   9.7 GiB used, 580 GiB / 590 GiB avail
    pgs:     2116/4232 objects degraded (50.000%)
             95 active+undersized+degraded
             18 active+undersized

  io:
    client:   17 KiB/s wr, 0 op/s rd, 10 op/s wr
######################################

thanks a lot,
ronny

Am 2024-04-30 11:42, schrieb Stefan Kooman:
On 30-04-2024 11:22, ronny.lippold wrote:
hi stefan ... you are the hero of the month ;)

:p.


i don't know, why i did not found your bug report.

i have the exact same problem and resolved the HEALTH only with "ceph osd force_healthy_stretch_mode --yes-i-really-mean-it"
will comment the report soon.

actually, we think about 4/2 size without stretch mode enable.

what was your solution?

This specific setup (on which I did the testing) is going to be full flash (SSD). So the HDDs are going to be phased out. And only the default non-device-class crush rule will be used. While that will work for this (small) cluster, it is not a solution. This issue should be fixed, as I figure there are quite a few cluster that want to use device-classes and use stretch mode at the same time.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux