> Op 15 juli 2016 om 10:48 schreef Mart van Santen <mart@xxxxxxxxxxxx>: > > > > Hi Wido, > > Thank you, we are currently in the same process so this information is > very usefull. Can you share why you upgraded from hammer directly to > jewel, is there a reason to skip infernalis? So, I wonder why you didn't > do a hammer->infernalis->jewel upgrade, as that seems the logical path > for me. > LTS to LTS upgrades, that's why. Tested it in small on a few VMs and afterwards did the production cluster. We needed to go to Jewel due to some fixes for large clusters and RGW features (AWS4) and fixes. Wido > (we did indeed saw the same errors "Failed to encode map eXXX with > expected crc" when upgrading to the latest hammer) > > > Regards, > > Mart > > > > > > > > On 07/15/2016 03:08 AM, 席智勇 wrote: > > good job, thank you for sharing, Wido~ > > it's very useful~ > > > > 2016-07-14 14:33 GMT+08:00 Wido den Hollander <wido@xxxxxxxx > > <mailto:wido@xxxxxxxx>>: > > > > To add, the RGWs upgraded just fine as well. > > > > No regions in use here (yet!), so that upgraded as it should. > > > > Wido > > > > > Op 13 juli 2016 om 16:56 schreef Wido den Hollander > > <wido@xxxxxxxx <mailto:wido@xxxxxxxx>>: > > > > > > > > > Hello, > > > > > > The last 3 days I worked at a customer with a 1800 OSD cluster > > which had to be upgraded from Hammer 0.94.5 to Jewel 10.2.2 > > > > > > The cluster in this case is 99% RGW, but also some RBD. > > > > > > I wanted to share some of the things we encountered during this > > upgrade. > > > > > > All 180 nodes are running CentOS 7.1 on a IPv6-only network. > > > > > > ** Hammer Upgrade ** > > > At first we upgraded from 0.94.5 to 0.94.7, this went well > > except for the fact that the monitors got spammed with these kind > > of messages: > > > > > > "Failed to encode map eXXX with expected crc" > > > > > > Some searching on the list brought me to: > > > > > > ceph tell osd.* injectargs -- --clog_to_monitors=false > > > > > > This reduced the load on the 5 monitors and made recovery > > succeed smoothly. > > > > > > ** Monitors to Jewel ** > > > The next step was to upgrade the monitors from Hammer to Jewel. > > > > > > Using Salt we upgraded the packages and afterwards it was simple: > > > > > > killall ceph-mon > > > chown -R ceph:ceph /var/lib/ceph > > > chown -R ceph:ceph /var/log/ceph > > > > > > Now, a systemd quirck. 'systemctl start ceph.target' does not > > work, I had to manually enabled the monitor and start it: > > > > > > systemctl enable ceph-mon@srv-zmb04-05.service > > > systemctl start ceph-mon@srv-zmb04-05.service > > > > > > Afterwards the monitors were running just fine. > > > > > > ** OSDs to Jewel ** > > > To upgrade the OSDs to Jewel we initially used Salt to update > > the packages on all systems to 10.2.2, we then used a Shell script > > which we ran on one node at a time. > > > > > > The failure domain here is 'rack', so we executed this in one > > rack, then the next one, etc, etc. > > > > > > Script can be found on Github: > > https://gist.github.com/wido/06eac901bd42f01ca2f4f1a1d76c49a6 > > > > > > Be aware that the chown can take a long, long, very long time! > > > > > > We ran into the issue that some OSDs crashed after start. But > > after trying again they would start. > > > > > > "void FileStore::init_temp_collections()" > > > > > > I reported this in the tracker as I'm not sure what is happening > > here: http://tracker.ceph.com/issues/16672 > > > > > > ** New OSDs with Jewel ** > > > We also had some new nodes which we wanted to add to the Jewel > > cluster. > > > > > > Using Salt and ceph-disk we ran into a partprobe issue in > > combination with ceph-disk. There was already a Pull Request for > > the fix, but that was not included in Jewel 10.2.2. > > > > > > We manually applied the PR and it fixed our issues: > > https://github.com/ceph/ceph/pull/9330 > > > > > > Hope this helps other people with their upgrades to Jewel! > > > > > > Wido > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Mart van Santen > Greenhost > E: mart@xxxxxxxxxxxx > T: +31 20 4890444 > W: https://greenhost.nl > > A PGP signature can be attached to this e-mail, > you need PGP software to verify it. > My public key is available in keyserver(s) > see: http://tinyurl.com/openpgp-manual > > PGP Fingerprint: CA85 EB11 2B70 042D AF66 B29A 6437 01A1 10A3 D3A5 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com