I still don't understand why I get any clean PGs in the erasure-coded pool, when with two OSDs down there is no more redundancy, and therefore all PGs should be undersized (or so I think). I repeated the experiment by bringing two remaining OSDs online, and then killing them, and got results similar to the previous test. But this time I observed the process more closely. Example showing state changes for one of the PGs [OSD assignment in brackets]: When all OSDs were online: [3,2,4,0,1], state: active+clean Initially after OSDs 3 and 4 were killed: [x,2,x,0,1], state: active+undersized+degraded After some time (OSD 3 and 4 still offline): [0,2,0,0,1], state: active+clean ('x' means that some large number was listed; I assume this meant original OSD unavailable) Another PG: When all OSDs were online: [0,3,2,1,4], state: active+clean Initially after OSDs 3 and 4 were killed: [0,x,2,1,x], state: active+undersized+degraded After some time (OSD 3 and 4 still offline): [0,1,2,1,1], state: active+clean+remapped Note: This PG became remapped, the previous one did not. Does this mean that these PGs now have 5 chunks, of which 3 are stored on one OSD? Perhaps I am missing something, but could this arrangement be redundant? And how can a non-redundant state be considered clean? By the way, I am using crush-failure-domain=host, and I have one OSD per host. On the good side, I have no complaints about how the replicated metadata pool operates. Unfortunately, I will not be able to replicate data in my future production cluster. One more thing, I figured out that "degraded" means "undersized and contains data". Thanks Maciej Puzio On Wed, May 9, 2018 at 7:07 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Wed, May 9, 2018 at 4:37 PM, Maciej Puzio <mkp37215@xxxxxxxxx> wrote: >> My setup consists of two pools on 5 OSDs, and is intended for cephfs: >> 1. erasure-coded data pool: k=3, m=2, size=5, min_size=3 (originally >> 4), number of PGs=128 >> 2. replicated metadata pool: size=3, min_size=2, number of PGs=100 >> >> When all OSDs were online, all PGs from both pools has status >> active+clean. After killing two of five OSDs (and changing min_size to >> 3), all metadata pool PGs remained active+clean, and of 128 data pool >> PGs, 3 remained active+clean, 11 became active+clean+remapped, and the >> rest became active+undersized, active+undersized+remapped, >> active+undersized+degraded or active+undersized+degraded+remapped, >> seemingly at random. >> >> After some time one of remaining three OSD nodes lost network >> connectivity (due to ceph-unrelated bug in virtio_net; this toy setup >> sure is becoming a bug motherlode!). The node was rebooted, ceph >> cluster become accessible again (with 3 out of 5 OSDs online, as >> before), and three active+clean data pool PGs now became >> active+clean+remapped, while the rest of PGs seems to have kept their >> previous status. > > That collection makes sense if you have a replicated pool as well as > an EC one, then. They represent different states for the PG; see > http://docs.ceph.com/docs/jewel/rados/operations/pg-states/ and > they're not random but the collection of which PG is in which set of > states is determined by how CRUSH placement and the failures interact, > and CRUSH is a pseudo-random algorithm, so... ;) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com