For reference, I'm currently running 26 nodes (338 OSDs); will be 35
nodes (455 OSDs) in the near future. Node/OSD provisioning and replacements: Mostly I'm using ceph-deploy, at least to do node/osd adds and replacements. Right now the process is: Use FAI (http://fai-project.org) to setup software RAID1/LVM for the OS disks, and do a minimal installation, including the salt-minion. Accept the new minion on the salt-master node and deploy the configuration. LDAP auth, nrpe, diamond collector, udev configuration, custom python disk add script, and everything on the Ceph preflight page (http://ceph.com/docs/firefly/start/quick-start-preflight/) Insert the journals into the case. Udev triggers my python code, which partitions the SSDs and fires a Prowl alert (http://www.prowlapp.com/) to my phone when it's finished. Insert the OSDs into the case. Same thing, udev triggers the python code, which selects the next available partition on the journals so OSDs go on journal1partA, journal2partA, journal3partA, journal1partB,... for the three journals in each node. The code then fires a salt event at the master node with the OSD dev path, journal /dev/by-id/ path and node hostname. The salt reactor on the master node takes this event and runs a script on the admin node which passes those parameters to ceph-deploy, which does the OSD deployment. Send Prowl alert on success or fail with details. Similarity, when an OSD fails, I remove it, and insert the new OSD. The same process as above occurs. Logical removal I do manually, since I'm not at a scale where it's common yet. Eventually, I imagine I'll write code to trigger OSD removal on certain events using the same event/reactor Salt framework. Pool/CRUSH management: Pool configuration and CRUSH management are mostly one-time operations. That is, I'll make a change rarely and when I do it will persist in that new state for a long time. Given that and the fact that I can make the changes from one node and inject them into the cluster, I haven't needed to automate that portion of Ceph as I've added more nodes, at least not yet. Replacing journals: I haven't had to do this yet; I'd probably remove/readd all the OSDs if it happened today, but will be reading the post you linked. Upgrading releases: Change the configuration of /etc/apt/source.list.d/ceph.list to point at new release and push to all the nodes with Salt. Then salt -N 'ceph' pkg.upgrade to upgrade the packages on all the nodes in the ceph nodegroup. Then, use Salt to restart the monitors, then the OSDs on each node, one by one. Finally run the following command on all nodes with Salt to verify all monitors/OSDs are using the new version: for i in $(ls /var/run/ceph/ceph-*.asok);do echo $i;ceph --admin-daemon $i version;done Node decommissioning: I have a script which enumerates all the OSDs on a given host and stores that list in a file. Another script (run by cron every 10 minutes) checks if the cluster health is OK, and if so pops the next OSD from that file and executes the steps to remove it from the host, trickling the node out of service. On 04/17/2015 02:18 PM, Craig Lewis
wrote:
-- Steve Anthony LTS HPC Support Specialist Lehigh University sma310@xxxxxxxxxx |
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com