Will, This issue in the tracker has an explanation of what is going on: http://tracker.ceph.com/issues/17386 So the encoding change caused the old OSDs to start requesting full OSDMap updates instead of incremental ones. I would still like to know the purpose of changing the encoding so late in the stable release series... Bryan On 9/22/16, 7:32 AM, "Will.Boege" <Will.Boege@xxxxxxxxxx> wrote: >Just went through this upgrading a ~400 OSD cluster. I was in the EXACT >spot you were in. The faster you can get all OSDs to the same version as >the MONs the better. We decided to power forward and the performance got >better for every OSD node we patched. > >Additionally I also discovered your LevelDBs will start growing >exponentially if you leave your cluster in that state for too long. > >Pretty sure the downrev OSDs are aggressively getting osdmaps from the >MONs causing some kind of spinlock condition. > >> On Sep 21, 2016, at 4:21 PM, Stillwell, Bryan J >><Bryan.Stillwell@xxxxxxxxxxx> wrote: >> >> While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 >>I've >> run into serious performance issues every time I restart an OSD. >> >> At first I thought the problem I was running into was caused by the >>osdmap >> encoding bug that Dan and Wido ran into when upgrading to 0.94.7, >>because >> I was seeing a ton (millions) of these messages in the logs: >> >> 2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : >>cluster >> [WRN] failed to encode map e727985 with expected cry >> >> Here are the links to their descriptions of the problem: >> >> http://www.spinics.net/lists/ceph-devel/msg30450.html >> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg30783.html >> >> I tried the solution of using the following command to stop those errors >> from occurring: >> >> ceph tell osd.* injectargs '--clog_to_monitors false' >> >> Which did get the messages to stop spamming the log files, however, it >> didn't fix the performance issue for me. >> >> Using dstat on the mon nodes I was able to determine that every time the >> osdmap is updated (by running 'ceph osd pool set data size 2' in this >> example) it causes the outgoing network on all mon nodes to be saturated >> for multiple seconds at a time: >> >> ----system---- ----total-cpu-usage---- ------memory-usage----- >>-net/total- >> -dsk/total- --io/total- >> time |usr sys idl wai hiq siq| used buff cach free| recv >> send| read writ| read writ >> 21-09 21:06:53| 1 0 99 0 0 0|11.8G 273M 18.7G 221G|2326k >> 9015k| 0 1348k| 0 16.0 >> 21-09 21:06:54| 1 1 98 0 0 0|11.9G 273M 18.7G 221G| 15M >> 10M| 0 1312k| 0 16.0 >> 21-09 21:06:55| 2 2 94 0 0 1|12.3G 273M 18.7G 220G| 14M >> 311M| 0 48M| 0 309 >> 21-09 21:06:56| 2 3 93 0 0 3|12.2G 273M 18.7G 220G|7745k >> 1190M| 0 16M| 0 93.0 >> 21-09 21:06:57| 1 2 96 0 0 1|12.0G 273M 18.7G 220G|8269k >> 1189M| 0 1956k| 0 10.0 >> 21-09 21:06:58| 3 1 95 0 0 1|11.8G 273M 18.7G 221G|4854k >> 752M| 0 4960k| 0 21.0 >> 21-09 21:06:59| 3 0 97 0 0 0|11.8G 273M 18.7G 221G|3098k >> 25M| 0 5036k| 0 26.0 >> 21-09 21:07:00| 1 0 98 0 0 0|11.8G 273M 18.7G 221G|2247k >> 25M| 0 9980k| 0 45.0 >> 21-09 21:07:01| 2 1 97 0 0 0|11.8G 273M 18.7G 221G|4149k >> 17M| 0 76M| 0 427 >> >> That would be 1190 MiB/s (or 9.982 Gbps). >> >> Restarting every OSD on a node at once as part of the upgrade causes a >> couple minutes worth of network saturation on all three mon nodes. This >> causes thousands of slow requests and many unhappy OpenStack users. >> >> I'm now stuck about 15% into the upgrade and haven't been able to >> determine how to move forward (or even backward) without causing another >> outage. >> >> I've attempted to run the same test on another cluster with 1300+ OSDs >>and >> the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 >>Gbps). >> >> Any suggestions on how I can proceed? >> >> Thanks, >> Bryan >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com