Re: How does CEPH calculates PGs per OSD for erasure coded (EC) pools?

Christian Wuerdig <christian.wuerdig@xxxxxxxxx> · Tue, 30 Apr 2019 06:24:39 +1200

On Sun, 28 Apr 2019 at 21:45, Igor Podlesny <ceph-user@xxxxxxxx> wrote:
On Sun, 28 Apr 2019 at 16:14, Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:

> Use k+m for PG calculation, that value also shows up as "erasure size"

> in ceph osd pool ls detail

So does it mean that for PG calculation those 2 pools are equivalent:

1) EC(4, 2)

2) replicated, size 6

Correct

? Sounds weird to be honest. Replicated with size 6 means each logical

data is stored 6 times, what needed single PG now requires 6 PGs.

And with EC(4, 2) there's still only 1.5 overhead in terms of raw

occupied space -- how come PG calculation distribution needs adjusting

to 6 instead of 1.5 then?

A single logical data unit (an object in ceph terms) will be allocated to a single PG. For a replicated pool of size n this PG will simply be stored on n OSDs. For an EC(k+m) pool this PG will get stored on k+m OSDs with the difference that this single PG will contain different parts of the data on the different OSDs.
http://docs.ceph.com/docs/master/architecture/#erasure-coding provides a good overview on how this is actually achieved. 

Also, why does CEPH documentation say "It is equivalent to a

replicated pool of size __two__" when describing EC(2, 1) example?

This relates to fault tolerance. A replicated pool of size 2 can loose one OSD without data loss and so can a EC(2+1) pool

-- 

End of message. Next message?

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com