Hello, On Sun, 12 Mar 2017 19:52:12 +1000 Brad Hubbard wrote: > On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > > Hi, > > > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > > fence … ;) > > > > I have had reports that Qemu (librbd connections) will require > > updates/restarts before upgrading. What was your experience on that side? > > Did you upgrade the clients? Did you start using any of the new RBD > > features, like fast diff? > > You don't need to restart qemu-kvm instances *before* upgrading but > you do need to restart or migrate them *after* updating. The updated > binaries are only loaded into the qemu process address space at > start-up so to load the newly installed binaries (libraries) you need > to restart or do a migration to an upgraded host. > Well, the OP wrote about live migration problems, but those were not in the qemu part of things but libvirt/openstack related. To wit, I did upgrade a test cluster from hammer to Jewel and live migration under ganeti worked fine. I've also not seen any problems on other instances that since have not been restarted, nor would I hope that an upgrade from one stable version to the next should EVER require such a step (at least immediately). Christian > > > > What’s your experience with load/performance after the upgrade? Found any > > new issues that indicate shifted hotspots? > > > > Cheers and thanks again, > > Christian > > > > On Mar 11, 2017, at 12:21 PM, cephmailinglist@xxxxxxxxx wrote: > > > > Hello list, > > > > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this > > email we want to share our experiences. > > > > > > We have four clusters: > > > > 1) Test cluster for all the fun things, completely virtual. > > > > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal > > > > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage > > > > 4) Main cluster (used for our custom software stack and openstack): 5 > > monitors and 1917 OSDs. 8 PB storage > > > > > > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph > > packages from ceph.com. On every cluster we upgraded the monitors first and > > after that, the OSDs. Our backup cluster is the only cluster that also > > serves S3 via the RadosGW and that service is upgraded at the same time as > > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without > > any problem, just an apt-get upgrade on every component. We did see the > > message "failed to encode map e<version> with expected crc", but that > > message disappeared when all the OSDs where upgraded. > > > > The upgrade of our biggest cluster, nr 4, did not go without problems. Since > > we where expecting a lot of "failed to encode map e<version> with expected > > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs > > -- --clog_to_monitors=false' so our monitors would not choke in those > > messages. The upgrade of the monitors did go as expected, without any > > problem, the problems started when we started the upgrade of the OSDs. In > > the upgrade procedure, we had to change the ownership of the files from root > > to the user ceph and that process was taking so long on our cluster that > > completing the upgrade would take more then a week. We decided to keep the > > permissions as they where for now, so in the upstart init script > > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to > > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade > > was completely done > > > > On cluster 3 (backup) we could change the permissions in a shorter time with > > the following procedure: > > > > a) apt-get -y install ceph-common > > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do > > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t > > c) (wait for all the chown's to complete) > > d) stop ceph-all > > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > > f) start ceph-all > > > > This procedure did not work on our main (4) cluster because the load on the > > OSDs became 100% in step b and that resulted in blocked I/O on some virtual > > instances in the Openstack cluster. Also at that time one of our pools got a > > lot of extra data, those files where stored with root permissions since we > > did not restarted the Ceph daemons yet, the 'find' in step e found so much > > files that xargs (the shell) could not handle it (too many arguments). At > > that time we decided to keep the permissions on root in the upgrade phase. > > > > The next and biggest problem we encountered had to do with the CRC errors on > > the OSD map. On every map update, the OSDs that were not upgraded yet, got > > that CRC error and asked the monitor for a full OSD map instead of just a > > delta update. At first we did not understand what exactly happened, we ran > > the upgrade per node using a script and in that script we watch the state of > > the cluster and when the cluster is healthy again, we upgrade the next host. > > Every time we started the script (skipping the already upgraded hosts) the > > first host(s) upgraded without issues and then we got blocked I/O on the > > cluster. The blocked I/O went away within a minute of 2 (not measured). > > After investigation we found out that the blocked I/O happened when nodes > > where asking the monitor for a (full) OSD map and that resulted shortly in a > > full saturated network link on our monitor. > > > > In the next graph the statistics for one of our Ceph monitor is shown. Our > > hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, > > the problems occurred. We could work around this problem by waiting four > > minutes between every host and after that time (14:20) we did not have any > > issues any more. Of course the number of not upgraded OSDs decreased, so the > > number of full OSD map requests also got smaller in time. > > > > > > <mon0_network_hammer_to_jewel_upgrade.png> > > > > > > The day after the upgrade we had issues with live migrations of Openstack > > instances. We got this message, "OSError: /usr/lib/librbd.so.1: undefined > > symbol: _ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This is > > resolved by restarting libvirt-bin and nova-compute on every compute node. > > > > Please notice that the upgrade of our biggest cluster was not a 100% > > success, but the problems where relative small and the cluster stayed > > on-line and there where only a few virtual openstack instances that did not > > like the blocked I/O and had to be restarted. > > > > > > -- > > > > With regards, > > > > Richard Arends. > > Snow BV / http://snow.nl > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > > Flying Circus Internet Operations GmbH · http://flyingcircus.io > > Forsterstraße 29 · 06112 Halle (Saale) · Deutschland > > HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. > > Zagrodnick > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com