Suggested best practise for Ceph node online/offline?

greg@xxxxxxxxxxx (Gregory Farnum) · Thu, 10 Jul 2014 09:55:56 -0700

On Thu, Jul 10, 2014 at 9:04 AM, Joe Hewitt <joe.z.hewitt at gmail.com> wrote:
> Hi there
> Recently I got a problem triggered by rebooting ceph nodes, which eventually
> wound up by rebuilding from ground up. A too-long-don't-read question here
> is: is there suggested best practices for online/offline ceph node?
>
> Following the official ceph doc, I set up a 4 node ceph (firefly) cluster in
> our lab last week. It consists of 1 admin, 1 mon and 2 osd nodes. All 4
> nodes are physical servers running Ubuntu 14.04, no virtual machines used.
> The mds and the 3rd osd are actually on the monitor node. Everything looked
> okay by then, 'ceph -w' gave me health_ok and I could observe all available
> storage capacity. I was happily writing python code using ceph S3 API.
>
> This Monday I apt-get upgraded all those 4 machines and then rebooted them.
> Once all 4 were back online, ceph -w gave some errors about IP address of
> monitor node and some error messages like "{timestamp} 7fb6456f2500  0 --
> {monitor IP address}:0/2924 ...... pipe(0x5516270 sd=97 :0 s=1 pgs=0 cs=0
> l=1c=0x5c54e20).fault"
> (sorry didn't save exact error logs since I've rebuilt the cluster, my
> mistake :-( ).
>
> Due to lab's policy, only DHCP is allowed, so I updated monitor IP address
> in /etc/ceph/ceph.conf and tried to push config to all nodes but that didn't
> work. Then I tried to restart ceph service on those nodes, no luck. I even
> went to ceph-deploy purgedata approach, no luck again. Then I had to purge
> all, restarted from zero. Again, I'm sorry no error msgs were saved, I was
> just too frustrated.
>
> Now I have a working cluster but I don't think I can afford redo it again.
> So the question mentioned above: how shall I properly do maintenance work
> without breaking my ceph cluster? Some procedure or commands I should issue
> after rebooting? Thanks

The monitors (and only the monitors) require fixed IP addresses. If
you can't give that to them, you're going to have a bad time. The best
suggestion off the top of my head is to maintain a cluster of them,
and, every time you need to reboot one:
1) stop it
2) remove it from the monitor map
3) reboot the machine
4) add it to the monitor map with the same name and new IP.
5) wait until the monitors report it's part of the quorum and nobody's syncing

And just configure the clients with the monitor hostnames to talk to,
rather than specific IPs.

There are docs about how to do that config management, but nobody's
ever tried or tested any of this so maybe I'm forgetting something.
And of course if you lose the IP of a majority of them at the same
time, you'll have a bad time again and need to do manual repairs. :/
Really, you just need to give them a fixed IP.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com