On Dec 29, 2014, Christian Eichelmann <christian.eichelmann@xxxxxxxx> wrote: > After we got everything up and running again, we still have 3 PGs in the > state incomplete. I was checking one of them directly on the systems > (replication factor is 3). I have run into this myself at least twice before. I had not lost or replaced the OSDs altogether, though; I had just rolled too many of them back to an earlier snapshots, which required them to be backfilled to catch up. It looks like an OSD won't get out of incomplete state, even to backfill others, if this would keep the PG active size under the min size for the pool. In my case, I brought the current-ish snapshot of the OSD back up to enable backfilling of enough replicas, so that I could then roll the remaining OSDs back again and have them backfilled too. However, I suspect that temporarily setting min size to a lower number could be enough for the PGs to recover. If "ceph osd pool <pool> set min_size 1" doesn't get the PGs going, I suppose restarting at least one of the OSDs involved in the recovery, so that they PG undergoes peering again, would get you going again. Once backfilling completes for all formerly-incomplete PGs, or maybe even as soon as backfilling begins, bringing the pool min_size back up to (presumably) 2 is advisable. You don't want to be running too long with a too-low min size :-) I hope this helps, Happy GNU Year, -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com