Re: Upgrading 2K OSDs from Hammer to Jewel. Our experience

Christian Balzer <chibi@xxxxxxx> · Mon, 13 Mar 2017 09:36:40 +0900

Hello,

On Sun, 12 Mar 2017 19:54:10 +0100 Florian Haas wrote:

> On Sat, Mar 11, 2017 at 12:21 PM, <cephmailinglist@xxxxxxxxx> wrote:
> > The upgrade of our biggest cluster, nr 4, did not go without
> > problems. Since we where expecting a lot of "failed to encode map
> > e<version> with expected crc" messages, we disabled clog to monitors
> > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our
> > monitors would not choke in those messages. The upgrade of the
> > monitors did go as expected, without any problem, the problems
> > started when we started the upgrade of the OSDs. In the upgrade
> > procedure, we had to change the ownership of the files from root to
> > the user ceph and that process was taking so long on our cluster that
> > completing the upgrade would take more then a week. We decided to
> > keep the permissions as they where for now, so in the upstart init
> > script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup
> > ceph' to  '--setuser root --setgroup root' and fix that OSD by OSD
> > after the upgrade was completely done  
> 
> For others following this thread who still have the hammer→jewel upgrade
> ahead: there is a ceph.conf option you can use here; no need to fiddle
> with the upstart scripts.
> 
> setuser match path = /var/lib/ceph/$type/$cluster-$id
>

Yes, I was thinking about mentioning this, too.
Alas in my experience with a wonky test cluster this failed with MDS,
maybe because of an odd name, maybe because nobody ever tested it.
MONs and OSDs were fine.

> What this will do is it will check which user owns files in the
> respective directories, and then start your Ceph daemons under the
> appropriate user and group IDs. In other words, if you enable this and
> you upgrade from Hammer to Jewel, and your files are still owned by
> root, your daemons will also continue run as root:root (as they did in
> hammer). Then, you can stop your OSDs, run the recursive chown, and
> restart the OSDs one-by-one. When they come back up, they will just
> automatically switch to running as ceph:ceph.
> 
Though if you have external journals and didn't use ceph-deploy, you're
boned with the whole ceph:ceph approach.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com