On 15/08/2014 14:36, Erik Logtenberg wrote: >>>>> Now, there are certain combinations of K and M that appear to have more >>>>> or less the same result. Do any of these combinations have pro's and >>>>> con's that I should consider and/or are there best practices for >>>>> choosing the right K/M-parameters? >>>>> >>> >>> Loic might have a better anwser, but I think that the more segments (K) >>> you have, the heavier recovery. You have to contact more OSDs to >>> reconstruct the whole object so that involves more disks doing seeks. >>> >>> I heard sombody from Fujitsu say that he thought 8/3 was best for most >>> situations. That wasn't with Ceph though, but with a different system >>> which implemented Erasure Coding. >> >> Performance is definitely lower with more segments in Ceph. I kind of >> gravitate toward 4/2 or 6/2, though that's just my own preference. > > This is indeed the kind of pro's and con's I was thinking about. > Performance-wise, I would expect differences, but I can think of both > positive and negative effects of bigger values for K. > > For instance, yes recovery takes more OSD's with bigger values of K, but > it seems to me that there are also less or smaller items to recover. > Also read-performance generally appears to benefit from having a bigger > cluster (more parallellism), so I can imagine that bigger values of K > also provide an increase in read-performance. > > Mark says more segments hurts performance though, are you referring just > to rebuild-performance or also basic operational performance (read/write)? > >>>>> For instance, if I choose K = 3 and M = 2, then pg's in this pool will >>>>> use 5 OSD's and sustain the loss of 2 OSD's. There is 40% overhead in >>>>> this configuration. >>>>> >>>>> Now, if I were to choose K = 6 and M = 4, I would end up with pg's that >>>>> use 10 OSD's and sustain the loss of 4 OSD's, which is statistically >>>>> not >>>>> so much different from the first configuration. Also there is the same >>>>> 40% overhead. >>>> >>>> Although I don't have numbers in mind, I think the odds of loosing two >>>> OSD simultaneously are a lot smaller than the odds of loosing four OSD >>>> simultaneously. Or am I misunderstanding you when you write >>>> "statistically not so much different from the first configuration" ? >>>> >>> >>> Loosing two smaller then loosing four? Is that correct or did you mean >>> it the other way around? >>> >>> I'd say that loosing four OSDs simultaneously is less likely to happen >>> then two simultaneously. >> >> This is true, though the more disks you spread your objects across, the >> higher likelihood that any given object will be affected by a lost OSD. >> The extreme case being that every object is spread across every OSD and >> losing any given OSD affects all objects. I suppose the severity >> depends on the relative fraction of your erasure coding parameters >> relative to the total number of OSDs. I think this is perhaps what Erik >> was getting at. > > I haven't done the actual calculations, but given some % chance of disk > failure, I would assume that losing x out of y disks has roughly the > same chance as losing 2*x out of 2*y disks over the same period. > > That's also why you generally want to limit RAID5 arrays to maybe 6 > disks or so and move to RAID6 for bigger arrays. For arrays bigger than > 20 disks you would usually split those into separate arrays, just to > keep the (parity disks / total disks) fraction high enough. > > With regard to data safety I would guess that 3+2 and 6+4 are roughly > equal, although the behaviour of 6+4 is probably easier to predict > because bigger numbers makes your calculations less dependent on > individual deviations in reliability. > > Do you guys feel this argument is valid? Here is how I reason about it, roughly: If the probability of loosing a disk is 0.1%, the probability of loosing two disks simultaneously (i.e. before the failure can be recovered) would be 0.1*0.1 = 0.01% and three disks becomes 0.1*0.1*0.1 = 0.001% and four disks becomes 0.0001% Accurately calculating the reliability of the system as a whole is a lot more complex (see https://wiki.ceph.com/Development/Add_erasure_coding_to_the_durability_model/ for more information). Cheers > > Erik. > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Lo?c Dachary, Artisan Logiciel Libre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 263 bytes Desc: OpenPGP digital signature URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140815/3ecc9439/attachment.pgp>