Upgrade Documentation: Wait for recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,
Recently we moved a bunch of our servers from one rack to another. In
the late stages of this we hit a point when some requests were blocked
due to one pg being in "peered" state.

This was unexpected to us, but on discussion with Wido we understand
why this happened. However it's brought up another point in that we
believed we were following the instructions as per upgrade
documentation. We've done our upgrades this way in the past without
hitting this "peered" state. The documentation says this:
"Ensure each upgraded Ceph OSD Daemon has rejoined the cluster"

We read this that you can go through and restart all the osd's one by
one in the whole cluster without waiting for recovery to happen.
Whereas it seems more like it should be:
"Ensure each upgraded Ceph OSD Daemon has rejoined the cluster" and
"ensure recovery has completed before moving on to the next {failure
domain}" where failure domain is host, rack etc depending on what is
in your crush map.

Thoughts? Should the documentation be more clear on this to help
people such as myself making this mistake?

Rich
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux