That worked and recovered startup of all four OSDs on the second node. In an abundance of caution, I only disabled one of the volumes with systemctl disable and then ran ceph-volume lvm activate --all. That cleaned up all of them though, so there was nothing left to do. https://bugzilla.redhat.com/show_bug.cgi?id=1567346#c21 helped resolve the final issue getting to HEALTH_OK. After rebuilding the mon/mgr node, I did not properly clear / restore the firewall. It’s odd that osd tree was reporting that two of the OSDs were up and in when the ports for mon/mgr/mds were all inaccessible. I don’t believe there were any failed creation attempts. Cardinal process rule with filesystems: Always maintain a known-good state that can be rolled back to. If an error comes up that can’t be fully explained, roll back and restart. Sometimes a command gets missed by the best of fingers and fully caffeinated minds.. :) I do see that I didn’t do a `ceph osd purge` on the empty/downed OSDs that were gracefully `out`. That explains the tree with the even numbered OSDs on the rebuilt node. After purging the references to the empty OSDs and re-adding the volumes, I am back to full health with all devices and OSDs up/in. THANK YOU!!! :D |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com