My setup consists of two pools on 5 OSDs, and is intended for cephfs: 1. erasure-coded data pool: k=3, m=2, size=5, min_size=3 (originally 4), number of PGs=128 2. replicated metadata pool: size=3, min_size=2, number of PGs=100 When all OSDs were online, all PGs from both pools has status active+clean. After killing two of five OSDs (and changing min_size to 3), all metadata pool PGs remained active+clean, and of 128 data pool PGs, 3 remained active+clean, 11 became active+clean+remapped, and the rest became active+undersized, active+undersized+remapped, active+undersized+degraded or active+undersized+degraded+remapped, seemingly at random. After some time one of remaining three OSD nodes lost network connectivity (due to ceph-unrelated bug in virtio_net; this toy setup sure is becoming a bug motherlode!). The node was rebooted, ceph cluster become accessible again (with 3 out of 5 OSDs online, as before), and three active+clean data pool PGs now became active+clean+remapped, while the rest of PGs seems to have kept their previous status. Thanks Maciej Puzio On Wed, May 9, 2018 at 4:49 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > active+clean does not make a lot of sense if every PG really was 3+2. But > perhaps you had a 3x replicated pool or something hanging out as well from > your deployment tool? > The active+clean+remapped means that a PG was somehow lucky enough to have > an existing "stray" copy on one of the OSDs that it has decided to use to > bring it back up to the right number of copies, even though they certainly > won't match the proper failure domains. > The min_size in relation to the k+m values won't have any direct impact > here, although they might indirectly affect it by changing how quickly stray > PGs get deleted. > -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com