Re: What's the best practice for Erasure Coding

Lars Marowsky-Bree <lmb@xxxxxxxx> · Thu, 11 Jul 2019 12:17:37 +0200

On 2019-07-11T09:46:47, Frank Schilder <frans@xxxxxx> wrote:

> Striping with stripe units other than 1 is something I also tested. I found that with EC pools non-trivial striping should be avoided. Firstly, EC is already a striped format and, secondly, striping on top of that with stripe_unit>1 will make every write an ec_overwrite, because now shards are rarely if ever written as a whole.

That's why I said that rbd's stripe_unit should match the EC pool's
stripe_width, or be a 2^n multiple of it. (Not sure what stripe_count
should be set to, probably also a small number of two.)

> The native striping in EC pools comes from k, data is striped over k disks. The higher k the more throughput at the expense of cpu and network.

Increasing k also increases stripe_width though; this leads to more IO
suffering from the ec_overwrite penalty.

> In my long list, this should actually be point
> 
> 6) Use stripe_unit=1 (default).

You mean stripe-count?

> To get back to your question, this is another argument for k=power-of-two. Object sizes in ceph are always powers of 2 and stripe sizes contain k as a factor. Hence, any prime factor other than 2 in k will imply a mismatch. How badly a mismatch affects performance should be tested.

Yes, of course. Depending on the IO pattern, this means more IO will be
misaligned or have non-stripe_width portions. (Most IO patterns, if they
strive for alignment, aim for a power of two alignment, obviously.)

> Results with non-trivial striping (stripe_size>1) were so poor, I did not even include them in my report.

stripe_size?

> We use the 8+2 pool for ceph fs, where throughput is important. The 6+2 pool is used for VMs (RBD images), where IOP/s are more important. It also offers a higher redundancy level. Its an acceptable compromise for us.

Especially with RBDs, I'm surprised that k=6 works well for you. Block
device IO is most commonly aligned on power-of-two boundaries.

Regards,
    Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)
"Architects should open possibilities and not determine everything." (Ueli Zbinden)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com