Hi Everyone, Recently we moved a bunch of our servers from one rack to another. In the late stages of this we hit a point when some requests were blocked due to one pg being in "peered" state. This was unexpected to us, but on discussion with Wido we understand why this happened. However it's brought up another point in that we believed we were following the instructions as per upgrade documentation. We've done our upgrades this way in the past without hitting this "peered" state. The documentation says this: "Ensure each upgraded Ceph OSD Daemon has rejoined the cluster" We read this that you can go through and restart all the osd's one by one in the whole cluster without waiting for recovery to happen. Whereas it seems more like it should be: "Ensure each upgraded Ceph OSD Daemon has rejoined the cluster" and "ensure recovery has completed before moving on to the next {failure domain}" where failure domain is host, rack etc depending on what is in your crush map. Thoughts? Should the documentation be more clear on this to help people such as myself making this mistake? Rich _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com