While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've run into serious performance issues every time I restart an OSD. At first I thought the problem I was running into was caused by the osdmap encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because I was seeing a ton (millions) of these messages in the logs: 2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : cluster [WRN] failed to encode map e727985 with expected cry Here are the links to their descriptions of the problem: http://www.spinics.net/lists/ceph-devel/msg30450.html https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg30783.html I tried the solution of using the following command to stop those errors from occurring: ceph tell osd.* injectargs '--clog_to_monitors false' Which did get the messages to stop spamming the log files, however, it didn't fix the performance issue for me. Using dstat on the mon nodes I was able to determine that every time the osdmap is updated (by running 'ceph osd pool set data size 2' in this example) it causes the outgoing network on all mon nodes to be saturated for multiple seconds at a time: ----system---- ----total-cpu-usage---- ------memory-usage----- -net/total- -dsk/total- --io/total- time |usr sys idl wai hiq siq| used buff cach free| recv send| read writ| read writ 21-09 21:06:53| 1 0 99 0 0 0|11.8G 273M 18.7G 221G|2326k 9015k| 0 1348k| 0 16.0 21-09 21:06:54| 1 1 98 0 0 0|11.9G 273M 18.7G 221G| 15M 10M| 0 1312k| 0 16.0 21-09 21:06:55| 2 2 94 0 0 1|12.3G 273M 18.7G 220G| 14M 311M| 0 48M| 0 309 21-09 21:06:56| 2 3 93 0 0 3|12.2G 273M 18.7G 220G|7745k 1190M| 0 16M| 0 93.0 21-09 21:06:57| 1 2 96 0 0 1|12.0G 273M 18.7G 220G|8269k 1189M| 0 1956k| 0 10.0 21-09 21:06:58| 3 1 95 0 0 1|11.8G 273M 18.7G 221G|4854k 752M| 0 4960k| 0 21.0 21-09 21:06:59| 3 0 97 0 0 0|11.8G 273M 18.7G 221G|3098k 25M| 0 5036k| 0 26.0 21-09 21:07:00| 1 0 98 0 0 0|11.8G 273M 18.7G 221G|2247k 25M| 0 9980k| 0 45.0 21-09 21:07:01| 2 1 97 0 0 0|11.8G 273M 18.7G 221G|4149k 17M| 0 76M| 0 427 That would be 1190 MiB/s (or 9.982 Gbps). Restarting every OSD on a node at once as part of the upgrade causes a couple minutes worth of network saturation on all three mon nodes. This causes thousands of slow requests and many unhappy OpenStack users. I'm now stuck about 15% into the upgrade and haven't been able to determine how to move forward (or even backward) without causing another outage. I've attempted to run the same test on another cluster with 1300+ OSDs and the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 Gbps). Any suggestions on how I can proceed? Thanks, Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com