Re: [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

"Stillwell, Bryan J" <Bryan.Stillwell@xxxxxxxxxxx> · Fri, 23 Sep 2016 20:24:03 +0000

Will,

This issue in the tracker has an explanation of what is going on:

http://tracker.ceph.com/issues/17386

So the encoding change caused the old OSDs to start requesting full OSDMap
updates instead of incremental ones.

I would still like to know the purpose of changing the encoding so late in
the stable release series...

Bryan

On 9/22/16, 7:32 AM, "Will.Boege" <Will.Boege@xxxxxxxxxx> wrote:

>Just went through this upgrading a ~400 OSD cluster. I was in the EXACT
>spot you were in. The faster you can get all OSDs to the same version as
>the MONs the better. We decided to power forward and the performance got
>better for every OSD node we patched.
>
>Additionally I also discovered your LevelDBs will start growing
>exponentially if you leave your cluster in that state for too long.
>
>Pretty sure the downrev OSDs are aggressively getting osdmaps from the
>MONs causing some kind of spinlock condition.
>
>> On Sep 21, 2016, at 4:21 PM, Stillwell, Bryan J
>><Bryan.Stillwell@xxxxxxxxxxx> wrote:
>> 
>> While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9
>>I've
>> run into serious performance issues every time I restart an OSD.
>> 
>> At first I thought the problem I was running into was caused by the
>>osdmap
>> encoding bug that Dan and Wido ran into when upgrading to 0.94.7,
>>because
>> I was seeing a ton (millions) of these messages in the logs:
>> 
>> 2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 :
>>cluster
>> [WRN] failed to encode map e727985 with expected cry
>> 
>> Here are the links to their descriptions of the problem:
>> 
>> http://www.spinics.net/lists/ceph-devel/msg30450.html
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg30783.html
>> 
>> I tried the solution of using the following command to stop those errors
>> from occurring:
>> 
>> ceph tell osd.* injectargs '--clog_to_monitors false'
>> 
>> Which did get the messages to stop spamming the log files, however, it
>> didn't fix the performance issue for me.
>> 
>> Using dstat on the mon nodes I was able to determine that every time the
>> osdmap is updated (by running 'ceph osd pool set data size 2' in this
>> example) it causes the outgoing network on all mon nodes to be saturated
>> for multiple seconds at a time:
>> 
>> ----system---- ----total-cpu-usage---- ------memory-usage-----
>>-net/total-
>> -dsk/total- --io/total-
>>     time     |usr sys idl wai hiq siq| used  buff  cach  free| recv
>> send| read  writ| read  writ
>> 21-09 21:06:53|  1   0  99   0   0   0|11.8G  273M 18.7G  221G|2326k
>> 9015k|   0  1348k|   0  16.0
>> 21-09 21:06:54|  1   1  98   0   0   0|11.9G  273M 18.7G  221G|  15M
>> 10M|   0  1312k|   0  16.0
>> 21-09 21:06:55|  2   2  94   0   0   1|12.3G  273M 18.7G  220G|  14M
>> 311M|   0    48M|   0   309
>> 21-09 21:06:56|  2   3  93   0   0   3|12.2G  273M 18.7G  220G|7745k
>> 1190M|   0    16M|   0  93.0
>> 21-09 21:06:57|  1   2  96   0   0   1|12.0G  273M 18.7G  220G|8269k
>> 1189M|   0  1956k|   0  10.0
>> 21-09 21:06:58|  3   1  95   0   0   1|11.8G  273M 18.7G  221G|4854k
>> 752M|   0  4960k|   0  21.0
>> 21-09 21:06:59|  3   0  97   0   0   0|11.8G  273M 18.7G  221G|3098k
>> 25M|   0  5036k|   0  26.0
>> 21-09 21:07:00|  1   0  98   0   0   0|11.8G  273M 18.7G  221G|2247k
>> 25M|   0  9980k|   0  45.0
>> 21-09 21:07:01|  2   1  97   0   0   0|11.8G  273M 18.7G  221G|4149k
>> 17M|   0    76M|   0   427
>> 
>> That would be 1190 MiB/s (or 9.982 Gbps).
>> 
>> Restarting every OSD on a node at once as part of the upgrade causes a
>> couple minutes worth of network saturation on all three mon nodes.  This
>> causes thousands of slow requests and many unhappy OpenStack users.
>> 
>> I'm now stuck about 15% into the upgrade and haven't been able to
>> determine how to move forward (or even backward) without causing another
>> outage.
>> 
>> I've attempted to run the same test on another cluster with 1300+ OSDs
>>and
>> the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126
>>Gbps).
>> 
>> Any suggestions on how I can proceed?
>> 
>> Thanks,
>> Bryan
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com