I'm curious what people managing larger ceph clusters are doing with configuration management and orchestration to simplify their lives? We've been using ceph-deploy to manage our ceph clusters so far, but feel that moving the management of our clusters to standard tools would provide a little more consistency and help prevent some mistakes that have happened while using ceph-deploy. We're looking at using the same tools we use in our OpenStack environment (puppet/ansible), but I'm interested in hearing from people using chef/salt/juju as well. Some of the cluster operation tasks that I can think of along with ideas/concerns I have are: Keyring management Seems like hiera-eyaml is a natural fit for storing the keyrings. ceph.conf I believe the puppet ceph module can be used to manage this file, but I'm wondering if using a template (erb?) might be better method to keeping it organized and properly documented. Pool configuration The puppet module seems to be able to handle managing replicas and the number of placement groups, but I don't see support for erasure coded pools yet. This is probably something we would want the initial configuration to be set up by puppet, but not something we would want puppet changing on a production cluster. CRUSH maps Describing the infrastructure in yaml makes sense. Things like which servers are in which rows/racks/chassis. Also describing the type of server (model, number of HDDs, number of SSDs) makes sense. CRUSH rules I could see puppet managing the various rules based on the backend storage (HDD, SSD, primary affinity, erasure coding, etc). Replacing a failed HDD disk Do you automatically identify the new drive and start using it right away? I've seen people talk about using a combination of udev and special GPT partition IDs to automate this. If you have a cluster with thousands of drives I think automating the replacement makes sense. How do you handle the journal partition on the SSD? Does removing the old journal partition and creating a new one create a hole in the partition map (because the old partition is removed and the new one is created at the end of the drive)? Replacing a failed SSD journal Has anyone automated recreating the journal drive using Sebastien Han's instructions, or do you have to rebuild all the OSDs as well? http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-jou rnal-failure/ Adding new OSD servers How are you adding multiple new OSD servers to the cluster? I could see an ansible playbook which disables nobackfill, noscrub, and nodeep-scrub followed by adding all the OSDs to the cluster being useful. Upgrading releases I've found an ansible playbook for doing a rolling upgrade which looks like it would work well, but are there other methods people are using? http://www.sebastien-han.fr/blog/2015/03/30/ceph-rolling-upgrades-with-ansi ble/ Decommissioning hardware Seems like another ansible playbook for reducing the OSDs weights to zero, marking the OSDs out, stopping the service, removing the OSD ID, removing the CRUSH entry, unmounting the drives, and finally removing the server would be the best method here. Any other ideas on how to approach this? That's all I can think of right now. Is there any other tasks that people have run into that are missing from this list? Thanks, Bryan This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com