On Wed, May 9, 2018 at 4:37 PM, Maciej Puzio <mkp37215@xxxxxxxxx> wrote: > My setup consists of two pools on 5 OSDs, and is intended for cephfs: > 1. erasure-coded data pool: k=3, m=2, size=5, min_size=3 (originally > 4), number of PGs=128 > 2. replicated metadata pool: size=3, min_size=2, number of PGs=100 > > When all OSDs were online, all PGs from both pools has status > active+clean. After killing two of five OSDs (and changing min_size to > 3), all metadata pool PGs remained active+clean, and of 128 data pool > PGs, 3 remained active+clean, 11 became active+clean+remapped, and the > rest became active+undersized, active+undersized+remapped, > active+undersized+degraded or active+undersized+degraded+remapped, > seemingly at random. > > After some time one of remaining three OSD nodes lost network > connectivity (due to ceph-unrelated bug in virtio_net; this toy setup > sure is becoming a bug motherlode!). The node was rebooted, ceph > cluster become accessible again (with 3 out of 5 OSDs online, as > before), and three active+clean data pool PGs now became > active+clean+remapped, while the rest of PGs seems to have kept their > previous status. That collection makes sense if you have a replicated pool as well as an EC one, then. They represent different states for the PG; see http://docs.ceph.com/docs/jewel/rados/operations/pg-states/ and they're not random but the collection of which PG is in which set of states is determined by how CRUSH placement and the failures interact, and CRUSH is a pseudo-random algorithm, so... ;) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com