Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

"Stillwell, Bryan J" <Bryan.Stillwell@xxxxxxxxxxx> · Wed, 21 Sep 2016 21:21:18 +0000

While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've
run into serious performance issues every time I restart an OSD.

At first I thought the problem I was running into was caused by the osdmap
encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because
I was seeing a ton (millions) of these messages in the logs:

2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : cluster
[WRN] failed to encode map e727985 with expected cry

Here are the links to their descriptions of the problem:

http://www.spinics.net/lists/ceph-devel/msg30450.html
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg30783.html

I tried the solution of using the following command to stop those errors
from occurring:

ceph tell osd.* injectargs '--clog_to_monitors false'

Which did get the messages to stop spamming the log files, however, it
didn't fix the performance issue for me.

Using dstat on the mon nodes I was able to determine that every time the
osdmap is updated (by running 'ceph osd pool set data size 2' in this
example) it causes the outgoing network on all mon nodes to be saturated
for multiple seconds at a time:

----system---- ----total-cpu-usage---- ------memory-usage----- -net/total-
-dsk/total- --io/total-
     time     |usr sys idl wai hiq siq| used  buff  cach  free| recv
send| read  writ| read  writ
21-09 21:06:53|  1   0  99   0   0   0|11.8G  273M 18.7G  221G|2326k
9015k|   0  1348k|   0  16.0
21-09 21:06:54|  1   1  98   0   0   0|11.9G  273M 18.7G  221G|  15M
10M|   0  1312k|   0  16.0
21-09 21:06:55|  2   2  94   0   0   1|12.3G  273M 18.7G  220G|  14M
311M|   0    48M|   0   309
21-09 21:06:56|  2   3  93   0   0   3|12.2G  273M 18.7G  220G|7745k
1190M|   0    16M|   0  93.0
21-09 21:06:57|  1   2  96   0   0   1|12.0G  273M 18.7G  220G|8269k
1189M|   0  1956k|   0  10.0
21-09 21:06:58|  3   1  95   0   0   1|11.8G  273M 18.7G  221G|4854k
752M|   0  4960k|   0  21.0
21-09 21:06:59|  3   0  97   0   0   0|11.8G  273M 18.7G  221G|3098k
25M|   0  5036k|   0  26.0
21-09 21:07:00|  1   0  98   0   0   0|11.8G  273M 18.7G  221G|2247k
25M|   0  9980k|   0  45.0
21-09 21:07:01|  2   1  97   0   0   0|11.8G  273M 18.7G  221G|4149k
17M|   0    76M|   0   427

That would be 1190 MiB/s (or 9.982 Gbps).

Restarting every OSD on a node at once as part of the upgrade causes a
couple minutes worth of network saturation on all three mon nodes.  This
causes thousands of slow requests and many unhappy OpenStack users.

I'm now stuck about 15% into the upgrade and haven't been able to
determine how to move forward (or even backward) without causing another
outage.

I've attempted to run the same test on another cluster with 1300+ OSDs and
the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 Gbps).

Any suggestions on how I can proceed?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com