On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > Hi, > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > fence … ;) > > I have had reports that Qemu (librbd connections) will require > updates/restarts before upgrading. What was your experience on that side? > Did you upgrade the clients? Did you start using any of the new RBD > features, like fast diff? You don't need to restart qemu-kvm instances *before* upgrading but you do need to restart or migrate them *after* updating. The updated binaries are only loaded into the qemu process address space at start-up so to load the newly installed binaries (libraries) you need to restart or do a migration to an upgraded host. > > What’s your experience with load/performance after the upgrade? Found any > new issues that indicate shifted hotspots? > > Cheers and thanks again, > Christian > > On Mar 11, 2017, at 12:21 PM, cephmailinglist@xxxxxxxxx wrote: > > Hello list, > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > email we want to share our experiences. > > > We have four clusters: > > 1) Test cluster for all the fun things, completely virtual. > > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal > > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage > > 4) Main cluster (used for our custom software stack and openstack): 5 > monitors and 1917 OSDs. 8 PB storage > > > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph > packages from ceph.com. On every cluster we upgraded the monitors first and > after that, the OSDs. Our backup cluster is the only cluster that also > serves S3 via the RadosGW and that service is upgraded at the same time as > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without > any problem, just an apt-get upgrade on every component. We did see the > message "failed to encode map e<version> with expected crc", but that > message disappeared when all the OSDs where upgraded. > > The upgrade of our biggest cluster, nr 4, did not go without problems. Since > we where expecting a lot of "failed to encode map e<version> with expected > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs > -- --clog_to_monitors=false' so our monitors would not choke in those > messages. The upgrade of the monitors did go as expected, without any > problem, the problems started when we started the upgrade of the OSDs. In > the upgrade procedure, we had to change the ownership of the files from root > to the user ceph and that process was taking so long on our cluster that > completing the upgrade would take more then a week. We decided to keep the > permissions as they where for now, so in the upstart init script > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade > was completely done > > On cluster 3 (backup) we could change the permissions in a shorter time with > the following procedure: > > a) apt-get -y install ceph-common > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t > c) (wait for all the chown's to complete) > d) stop ceph-all > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > f) start ceph-all > > This procedure did not work on our main (4) cluster because the load on the > OSDs became 100% in step b and that resulted in blocked I/O on some virtual > instances in the Openstack cluster. Also at that time one of our pools got a > lot of extra data, those files where stored with root permissions since we > did not restarted the Ceph daemons yet, the 'find' in step e found so much > files that xargs (the shell) could not handle it (too many arguments). At > that time we decided to keep the permissions on root in the upgrade phase. > > The next and biggest problem we encountered had to do with the CRC errors on > the OSD map. On every map update, the OSDs that were not upgraded yet, got > that CRC error and asked the monitor for a full OSD map instead of just a > delta update. At first we did not understand what exactly happened, we ran > the upgrade per node using a script and in that script we watch the state of > the cluster and when the cluster is healthy again, we upgrade the next host. > Every time we started the script (skipping the already upgraded hosts) the > first host(s) upgraded without issues and then we got blocked I/O on the > cluster. The blocked I/O went away within a minute of 2 (not measured). > After investigation we found out that the blocked I/O happened when nodes > where asking the monitor for a (full) OSD map and that resulted shortly in a > full saturated network link on our monitor. > > In the next graph the statistics for one of our Ceph monitor is shown. Our > hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, > the problems occurred. We could work around this problem by waiting four > minutes between every host and after that time (14:20) we did not have any > issues any more. Of course the number of not upgraded OSDs decreased, so the > number of full OSD map requests also got smaller in time. > > > <mon0_network_hammer_to_jewel_upgrade.png> > > > The day after the upgrade we had issues with live migrations of Openstack > instances. We got this message, "OSError: /usr/lib/librbd.so.1: undefined > symbol: _ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This is > resolved by restarting libvirt-bin and nova-compute on every compute node. > > Please notice that the upgrade of our biggest cluster was not a 100% > success, but the problems where relative small and the cluster stayed > on-line and there where only a few virtual openstack instances that did not > like the blocked I/O and had to be restarted. > > > -- > > With regards, > > Richard Arends. > Snow BV / http://snow.nl > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > Flying Circus Internet Operations GmbH · http://flyingcircus.io > Forsterstraße 29 · 06112 Halle (Saale) · Deutschland > HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. > Zagrodnick > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com