Re: What is the meaning of size and min_size for erasure-coded pools?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 9 May 2018 17:07:21 -0700

On Wed, May 9, 2018 at 4:37 PM, Maciej Puzio <mkp37215@xxxxxxxxx> wrote:
> My setup consists of two pools on 5 OSDs, and is intended for cephfs:
> 1. erasure-coded data pool: k=3, m=2, size=5, min_size=3 (originally
> 4), number of PGs=128
> 2. replicated metadata pool: size=3, min_size=2, number of PGs=100
>
> When all OSDs were online, all PGs from both pools has status
> active+clean. After killing two of five OSDs (and changing min_size to
> 3), all metadata pool PGs remained active+clean, and of 128 data pool
> PGs, 3 remained active+clean, 11 became active+clean+remapped, and the
> rest became active+undersized, active+undersized+remapped,
> active+undersized+degraded or active+undersized+degraded+remapped,
> seemingly at random.
>
> After some time one of remaining three OSD nodes lost network
> connectivity (due to ceph-unrelated bug in virtio_net; this toy setup
> sure is becoming a bug motherlode!). The node was rebooted, ceph
> cluster become accessible again (with 3 out of 5 OSDs online, as
> before), and three active+clean data pool PGs now became
> active+clean+remapped, while the rest of PGs seems to have kept their
> previous status.

That collection makes sense if you have a replicated pool as well as
an EC one, then. They represent different states for the PG; see
http://docs.ceph.com/docs/jewel/rados/operations/pg-states/ and
they're not random but the collection of which PG is in which set of
states is determined by how CRUSH placement and the failures interact,
and CRUSH is a pseudo-random algorithm, so... ;)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com