Oh dear. Every occurrence of stripe_* is wrong :) It should be stripe_count (option --stripe-count in rbd create) everywhere in my text. What choices are legal depends on the restrictions on stripe_count*stripe_unit (=stripe_size=stripe_width?) imposed by ceph. I believe all of this ends up being powers of 2. Yes, the 6+2 is a bit surprising. I have no explanation for the observation. It just seems a good argument for "do not trust what you believe, gather facts". And to try things that seem non-obvious - just to be sure. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Lars Marowsky-Bree <lmb@xxxxxxxx> Sent: 11 July 2019 12:17:37 To: ceph-users@xxxxxxxxxxxxxx Subject: Re: What's the best practice for Erasure Coding On 2019-07-11T09:46:47, Frank Schilder <frans@xxxxxx> wrote: > Striping with stripe units other than 1 is something I also tested. I found that with EC pools non-trivial striping should be avoided. Firstly, EC is already a striped format and, secondly, striping on top of that with stripe_unit>1 will make every write an ec_overwrite, because now shards are rarely if ever written as a whole. That's why I said that rbd's stripe_unit should match the EC pool's stripe_width, or be a 2^n multiple of it. (Not sure what stripe_count should be set to, probably also a small number of two.) > The native striping in EC pools comes from k, data is striped over k disks. The higher k the more throughput at the expense of cpu and network. Increasing k also increases stripe_width though; this leads to more IO suffering from the ec_overwrite penalty. > In my long list, this should actually be point > > 6) Use stripe_unit=1 (default). You mean stripe-count? > To get back to your question, this is another argument for k=power-of-two. Object sizes in ceph are always powers of 2 and stripe sizes contain k as a factor. Hence, any prime factor other than 2 in k will imply a mismatch. How badly a mismatch affects performance should be tested. Yes, of course. Depending on the IO pattern, this means more IO will be misaligned or have non-stripe_width portions. (Most IO patterns, if they strive for alignment, aim for a power of two alignment, obviously.) > Results with non-trivial striping (stripe_size>1) were so poor, I did not even include them in my report. stripe_size? > We use the 8+2 pool for ceph fs, where throughput is important. The 6+2 pool is used for VMs (RBD images), where IOP/s are more important. It also offers a higher redundancy level. Its an acceptable compromise for us. Especially with RBDs, I'm surprised that k=6 works well for you. Block device IO is most commonly aligned on power-of-two boundaries. Regards, Lars -- SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg) "Architects should open possibilities and not determine everything." (Ueli Zbinden) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com