Dear Frank, "For production systems I would recommend to use EC profiles with at least m=3" -> can i set min_size with min_size=4 for ec4+2 it's ok for productions? My data is video from the camera system, it's hot data, write and delete in some day, 10-15 day ex... Read and write availability is more important than data loss Thanks Frank Vào Th 2, 15 thg 1, 2024 vào lúc 16:46 Frank Schilder <frans@xxxxxx> đã viết: > I would like to add here a detail that is often overlooked: > maintainability under degraded conditions. > > For production systems I would recommend to use EC profiles with at least > m=3. The reason being that if you have a longer problem with a node that is > down and m=2 it is not possible to do any maintenance on the system without > loosing write access. Don't trust what users claim they are willing to > tolerate - at least get it in writing. Once a problem occurs they will be > at your door step no matter what they said before. > > Similarly, when doing a longer maintenance task and m=2, any disk fail > during maintenance will imply loosing write access. > > Having m=3 or larger allows for 2 (or larger) numbers of hosts/OSDs being > unavailable simultaneously while service is fully operational. That can be > a life saver in many situations. > > An additional reason for larger m is systematic failures of drives if your > vendor doesn't mix drives from different batches and factories. If a batch > has a systematic production error, failures are no longer statistically > independent. In such a situation, if one drives fails the likelihood that > more drives fail at the same time is very high. Having a larger number of > parity shards increases the chances of recovering from such events. > > For similar reasons I would recommend to deploy 5 MONs instead of 3. My > life got so much better after having the extra redundancy. > > As some background, in our situation we experience(d) somewhat heavy > maintenance operations including modifying/updating ceph nodes (hardware, > not software), exchanging Racks, switches, cooling and power etc. This > required longer downtime and/or moving of servers and moving the ceph > hardware was the easiest compared with other systems due to the extra > redundancy bits in it. We had no service outages during such operations. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Anthony D'Atri <anthony.datri@xxxxxxxxx> > Sent: Saturday, January 13, 2024 5:36 PM > To: Phong Tran Thanh > Cc: ceph-users@xxxxxxx > Subject: Re: Recomand number of k and m erasure code > > There are nuances, but in general the higher the sum of m+k, the lower the > performance, because *every* operation has to hit that many drives, which > is especially impactful with HDDs. So there’s a tradeoff between storage > efficiency and performance. And as you’ve seen, larger parity groups > especially mean slower recovery/backfill. > > There’s also a modest benefit to choosing values of m and k that have > small prime factors, but I wouldn’t worry too much about that. > > > You can find EC efficiency tables on the net: > > > > https://docs.netapp.com/us-en/storagegrid-116/ilm/what-erasure-coding-schemes-are.html > > > I should really add a table to the docs, making a note to do that. > > There’s a nice calculator at the OSNEXUS site: > > https://www.osnexus.com/ceph-designer; > > > The overhead factor is (k+m) / k > > So for a 4,2 profile, that’s 6 / 4 == 1.5 > > For 6,2, 8 / 6 = 1.33 > > For 10,2, 12 / 10 = 1.2 > > and so forth. As k increases, the incremental efficiency gain sees > diminishing returns, but performance continues to decrease. > > Think of m as the number of copies you can lose without losing data, and > m-1 as the number you can lose / have down and still have data *available*. > > I also suggest that the number of failure domains — in your cases this > means OSD nodes — be *at least* k+m+1, so in your case you want k+m to be > at most 9. > > With RBD and many CephFS implementations, we mostly have relatively large > RADOS objects that are striped over many OSDs. > > When using RGW especially, one should attend to average and median S3 > object size. There’s an analysis of the potential for space amplification > in the docs so I won’t repeat it here in detail. This sheet > https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit#gid=358760253 > visually demonstrates this. > > Basically, for an RGW bucket pool — or for a CephFS data pool storing > unusually small objects — if you have a lot of S3 objects in the multiples > of KB size, you waste a significant fraction of underlying storage. This > is exacerbated by EC, and the larger the sum of k+m, the more waste. > > When people ask me about replication vs EC and EC profile, the first > question I ask is what they’re storing. When EC isn’t a non-starter, I > tend to recommend 4,2 as a profile until / unless someone has specific > needs and can understand the tradeoffs. This lets you store ~~ 2x the data > of 3x replication while not going overboard on the performance hit. > > If you care about your data, do not set m=1. > > If you need to survive the loss of many drives, say if your cluster is > across multiple buildings or sites, choose a larger value of k. There are > people running profiles like 4,6 because they have unusual and specific > needs. > > > > > > On Jan 13, 2024, at 10:32 AM, Phong Tran Thanh <tranphong079@xxxxxxxxx> > wrote: > > > > Hi ceph user! > > > > I need to determine which erasure code values (k and m) to choose for a > > Ceph cluster with 10 nodes. > > > > I am using the reef version with rbd. Furthermore, when using a larger k, > > for example, ec6+2 and ec4+2, which erasure coding performance is better, > > and what are the criteria for choosing the appropriate erasure coding? > > Please help me > > > > Email: tranphong079@xxxxxxxxx > > Skype: tranphong079 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Trân trọng, ---------------------------------------------------------------------------- *Tran Thanh Phong* Email: tranphong079@xxxxxxxxx Skype: tranphong079 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx