Re: How to check available storage with EC and different sized OSD's ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With a 3 osd pool it's not possible for data to be redistributed on failure of an OSD.  with a K=2,M=1 value your minimum number of OSDs for distributions sake is 3.  If you need the ability to redistribute data on failure you'd need a 4th OSD.  You k/m value can't be larger than your failure domain and if it's set exactly to your failure domain you'll never redistribute data on failure.
________________________________
From: Paweł Kowalski <pk@xxxxxxxxxxxx>
Sent: 09 November 2022 15:14
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  Re: How to check available storage with EC and different sized OSD's ?

CAUTION: This email originates from outside THG

If I start to use all available space that pool can offer (4.5T) and
first OSD (2.7T) fails, I'm sure I'll end up with lost data since it's
not possible to fit 4.5T on 2 remaining drives with total raw capacity
of 3.6T.

I'm wondering why ceph isn't complaining now. I thought it should place
data among disks in that way, that loosing any OSD would keep data safe
for RO. (by wasting excessive 0.9T capacity on the first drive)


Oh, and here's my rule and profile - by mistake I've sent it on PM:


rule ceph3_ec_low_k2_m1-data {
     id 2
     type erasure
     min_size 3
     max_size 3
     step set_chooseleaf_tries 5
     step set_choose_tries 100
     step take default class low_hdd
     step choose indep 0 type osd
     step emit
}

crush-device-class=low_hdd
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8


Paweł


W dniu 8.11.2022 o 15:47, Danny Webb pisze:
> with a m value of 1 if you lost a single OSD/failure domain you'd end up with a read only pg or cluster.  usually you need at least k+1 to survive a failure domain failure depending on your min_size setting.  The other thing you need to take into consideration is that the m value is for both failure domain *and* osd in an unlucky scenario (eg, you had a pg that happened to be on a downed host and a failed OSD elsewhere in the cluster).    For a 3 OSD configuration the minimum fault tolerant setup would be k=1, m=2 and you effectively then are doing replica 3 anyways.  At least this is my understanding of it.  Hope that helps
> ________________________________
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


Danny Webb
Principal OpenStack Engineer
The Hut Group<http://www.thehutgroup.com/>

Tel:
Email: Danny.Webb@xxxxxxxxxxxxxxx<mailto:Danny.Webb@xxxxxxxxxxxxxxx>

For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries.

Confidentiality Notice
This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company.

Encryptions and Viruses
Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail.

Monitoring
Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes.

hgvyjuv
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux