Re: kraken-bluestore 11.2.0 memory leak issue

John Spray <jspray@xxxxxxxxxx> · Wed, 29 Mar 2017 11:12:12 +0100

I think it could be because of this:
http://tracker.ceph.com/issues/19407

The clients were meant to stop trying to send reports to the mgr when
it goes offline, but the monitor may not have been correctly updating
the mgr map to inform clients that the active mgr had gone offline.

John

On Wed, Mar 29, 2017 at 8:12 AM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
> Hello,
>
> We manually fixed the issue and below is  our analysis.
>
> Due to high CPU utilisation we stopped the ceph-mgr on all our cluster.
> On one of our cluster we saw high memory usage by OSDs some grater than 5GB
> causing OOM , resulting in process kill.
>
> The memory was released immediately when the ceph-mgr started . So, this
> issue is clearly a side effect of stopping the ceph-mgr process . What we
> dont understand is why all OSDs are reporting a single OSD issue and locking
> so much of memory until ceph-mgr is started ?
>
> 1. Ceph status higliting there is issue on one of the OSD in 5th node "wrong
> node":
>
> cn1.vn1ldv1c1.cdn ~# ceph status
> 2017-03-28 05:54:52.210450 7f8108a84700 1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2017-03-28 05:54:52.231551 7f8108a84700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2017-03-28 05:54:52.400565 7f8101ac6700 0 - 10.139.4.81:0/2856869581 >>
> 10.139.4.85:6800/44958 conn(0x7f80e8002bc0 :-1
> s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0
> l=1)._process_connection connect claims to be 10.139.4.85:6800/273761 not
> 10.139.4.85:6800/44958 - wrong node!
> cluster 71a32568-96f3-4998-89dd-e2e7d77a6824
> health HEALTH_OK
> monmap e3: 5 mons at
> {cn1=10.139.4.81:6789/0,cn2=10.139.4.82:6789/0,cn3=10.139.4.83:6789/0,cn4=10.139.4.84:6789/0,cn5=10.139.4.85:6789/0}
> election epoch 24, quorum 0,1,2,3,4 cn1,cn2,cn3,cn4,cn5
> mgr active: cn5
> osdmap e2010: 335 osds: 335 up, 335 in
> flags sortbitwise,require_jewel_osds,require_kraken_osds
> pgmap v561117: 8192 pgs, 1 pools, 28323 GB data, 12400 kobjects
> 37667 GB used, 1182 TB / 1218 TB avail
> 8192 active+clean
> client io 31732 kB/s rd, 57763 kB/s wr, 59 op/s rd, 479 op/s wr
>
> 2. numastat of ceph shows that it has consumed total of 275GB memory and
> most of them consuming more than 5GB.
>
> cn1.vn1ldv1c1.cdn /var/log/cassandra# numastat -s ceph
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Total
> ----------------- --------------- --------------- ---------------
> 372602 (ceph-osd) 5418.34 2.84 5421.18
> 491602 (ceph-osd) 5351.95 2.83 5354.78
> 417717 (ceph-osd) 5175.98 2.83 5178.81
> 273980 (ceph-osd) 5167.83 2.82 5170.65
> 311956 (ceph-osd) 5167.04 2.84 5169.88
> 440537 (ceph-osd) 5161.57 2.84 5164.41
> 368422 (ceph-osd) 5157.87 2.83 5160.70
> 292227 (ceph-osd) 5156.42 2.83 5159.25
> 360749 (ceph-osd) 5129.43 2.83 5132.26
> 516040 (ceph-osd) 5112.53 2.84 5115.37
> 526274 (ceph-osd) 5108.76 2.83 5111.59
> 300197 (ceph-osd) 5096.64 2.82 5099.46
> 487087 (ceph-osd) 5081.70 2.82 5084.52
> 396562 (ceph-osd) 5060.55 2.84 5063.38
> 409201 (ceph-osd) 5058.06 2.83 5060.89
> 284767 (ceph-osd) 5027.94 2.82 5030.76
> 520653 (ceph-osd) 4997.16 2.82 4999.98
> 302873 (ceph-osd) 4988.78 2.82 4991.60
> 364601 (ceph-osd) 4884.61 2.83 4887.43
> 426998 (ceph-osd) 4865.89 2.82 4868.72
> 294856 (ceph-osd) 4851.96 2.82 4854.78
> 306064 (ceph-osd) 4780.84 2.85 4783.68
> 449676 (ceph-osd) 4764.82 2.84 4767.66
> 376469 (ceph-osd) 4750.47 2.82 4753.29
> 482502 (ceph-osd) 4729.85 2.84 4732.69
> 357126 (ceph-osd) 4706.88 2.82 4709.71
> 346001 (ceph-osd) 4693.43 2.84 4696.27
> 511640 (ceph-osd) 4668.58 2.82 4671.41
> 282682 (ceph-osd) 4614.66 2.84 4617.50
> 287330 (ceph-osd) 4613.75 2.82 4616.57
> 506197 (ceph-osd) 4604.59 2.84 4607.43
> 332253 (ceph-osd) 4587.28 2.82 4590.11
> 413603 (ceph-osd) 4579.29 2.84 4582.12
> 297473 (ceph-osd) 4569.20 2.84 4572.04
> 431396 (ceph-osd) 4537.83 2.84 4540.66
> 501424 (ceph-osd) 4533.19 2.84 4536.03
> 477729 (ceph-osd) 4505.37 2.83 4508.20
> 392520 (ceph-osd) 4439.75 2.84 4442.59
> 280349 (ceph-osd) 4389.06 2.82 4391.88
> 321805 (ceph-osd) 4385.80 2.82 4388.62
> 463759 (ceph-osd) 4369.09 2.82 4371.91
> 328971 (ceph-osd) 4349.35 2.82 4352.18
> 530916 (ceph-osd) 4330.21 2.82 4333.03
> 468626 (ceph-osd) 4272.68 2.83 4275.51
> 353261 (ceph-osd) 4266.01 2.82 4268.83
> 339729 (ceph-osd) 4194.93 2.82 4197.75
> 422844 (ceph-osd) 4157.31 2.82 4160.14
> 400631 (ceph-osd) 4155.34 2.82 4158.16
> 325467 (ceph-osd) 4144.66 2.84 4147.50
> 380309 (ceph-osd) 4119.42 2.82 4122.24
> 454764 (ceph-osd) 4007.09 2.82 4009.92
> 336089 (ceph-osd) 4003.25 2.82 4006.07
> 349613 (ceph-osd) 3953.32 2.84 3956.15
> 473107 (ceph-osd) 3833.75 2.83 3836.59
> 388421 (ceph-osd) 3776.79 2.83 3779.62
> 308957 (ceph-osd) 3758.94 2.82 3761.76
> 315430 (ceph-osd) 3677.42 2.82 3680.24
> 445064 (ceph-osd) 3669.27 2.82 3672.09
> 977162 (ceph-osd) 1508.02 3.40 1511.43
> 166155 (ceph-osd) 1411.64 3.42 1415.06
> 228123 (ceph-osd) 1399.20 3.41 1402.60
> 39367 (ceph-osd) 1397.44 3.41 1400.85
> 228124 (ceph-osd) 1227.50 3.41 1230.91
> 284384 (ceph-osd) 1204.96 3.41 1208.37
> 339890 (ceph-osd) 1139.69 3.41 1143.10
> 467652 (ceph-osd) 1016.18 3.41 1019.59
> 597584 (ceph-osd) 901.18 3.41 904.58
> 934986 (ceph-mon) 0.02 184.65 184.67
> ----------------- --------------- --------------- ---------------
> Total 278720.27 379.42 279099.69
>
> 3. OOM tries to kill the ceph-osd process
>
> Mar 27 01:57:05 cn1 kernel: ceph-osd invoked oom-killer: gfp_mask=0x280da,
> order=0, oom_score_adj=0
> Mar 27 01:57:05 cn1 kernel: ceph-osd cpuset=/ mems_allowed=0-1
> Mar 27 01:57:05 cn1 kernel: CPU: 0 PID: 422861 Comm: ceph-osd Not tainted
> 3.10.0-327.el7.x86_64 #1
> Mar 27 01:57:05 cn1 kernel: Hardware name: HP ProLiant XL450 Gen9
> Server/ProLiant XL450 Gen9 Server, BIOS U21 09/12/2016
> Mar 27 01:57:05 cn1 kernel: ffff884546751700 00000000275e2e50
> ffff88454137b6f0 ffffffff816351f1
> Mar 27 01:57:05 cn1 kernel: ffff88454137b780 ffffffff81630191
> 000000000487ffff ffff8846f0665590
> Mar 27 01:57:05 cn1 kernel: ffff8845411b3ad8 ffff88454137b7d0
> ffffffffffffffd5 0000000000000001
> Mar 27 01:57:05 cn1 kernel: Call Trace:
> Mar 27 01:57:05 cn1 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116cdee>]
> oom_kill_process+0x24e/0x3b0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116c956>] ?
> find_lock_task_mm+0x56/0xc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811737f5>]
> __alloc_pages_nodemask+0xa95/0xb90
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811b78ca>] alloc_pages_vma+0x9a/0x140
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81197655>] handle_mm_fault+0xb85/0xf50
> Mar 27 01:57:05 cn1 kernel: [<ffffffffa04f5b22>] ?
> xfs_perag_get_tag+0x42/0xe0 [xfs]
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81640e22>] __do_page_fault+0x152/0x420
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
> Mar 27 01:57:05 cn1 kernel: [<ffffffff813000c9>] ?
> copy_user_enhanced_fast_string+0x9/0x20
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8130600a>] ? memcpy_toiovec+0x4a/0x90
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8151f91f>]
> skb_copy_datagram_iovec+0x12f/0x2a0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81574418>] tcp_recvmsg+0x248/0xbc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff810bb685>] ? sched_clock_cpu+0x85/0xc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff815a10eb>] inet_recvmsg+0x7b/0xa0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150ffb6>]
> sock_aio_read.part.7+0x146/0x160
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811ddcdd>] do_sync_read+0x8d/0xd0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811de4e5>] vfs_read+0x145/0x170
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81645909>]
> system_call_fastpath+0x16/0x1b
> Mar 27 01:57:05 cn1 kernel: Mem-Info:
> Mar 27 01:57:05 cn1 kernel: Node 0 DMA per-cpu:
>
> 4. On all OSDs below error flooded
> 2017-03-28 12:51:28.889658 7f82dd053700 0 -- 10.139.4.83:6850/121122 >>
> 10.139.4.85:6800/44958 conn(0x7f82eeeea000 :-1
> s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=1164657 cs=2
> l=0)._process_connection connect claims to be 10.139.4.85:6800/273761 not
> 10.139.4.85:6800/44958 - wrong node!
>
> on affected osd logs,
> 2017-03-28 12:51:29.191346 7f8a775b6700 0 -- 10.139.4.85:6800/273761 >> -
> conn(0x7f8aad6c0000 :6800 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
> l=0).fault with nothing to send and in the half accept state just closed
> 2017-03-28 12:51:29.249841 7f8a775b6700 0 -- 10.139.4.85:6800/273761 >> -
> conn(0x7f8aacc34800 :6800 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
> l=0).fault with nothing to send and in the half accept state just closed
>
> 5. Once ceph-mgr started on the affected OSD node, all OSDs move to
> reconnect state.
>
> 2017-03-28 12:52:37.468455 7f8a77db7700 0 -- 10.139.4.85:6800/273761 >>
> 10.139.4.85:6928/185705 conn(0x7f8aacc34800 :-1 s=STATE_OPEN pgs=32 cs=1
> l=0).*fault initiating reconnect*
>
> 2017-03-28 12:52:37.468502 7f2bbcd5b700 0 -- 10.139.4.85:6897/154091 >>
> 10.139.4.85:6928/185705 conn(0x7f2c63448800 :-1 s=STATE_OPEN pgs=301 cs=1
> l=0).fault initiating reconnect
>
> 2017-03-28 12:52:37.469503 7fd36f161700 0 -- 10.139.4.84:6822/95096 >>
> 10.139.4.85:6928/185705 conn(0x7fd412f69800 :-1 s=STATE_OPEN pgs=173 cs=1
> l=0).*fault initiating reconnect*
>
> 2017-03-28 12:52:37.463913 7f82dd053700 0 -- 10.139.4.83:6850/121122 >>
> 10.139.4.85:6928/185705 conn(0x7f83da64b800 :-1 s=STATE_OPEN pgs=154 cs=1
> l=0).*fault initiating reconnect*
>
> 2017-03-28 12:52:37.468406 7fea1fc05700 0 -- 10.139.4.82:6816/97331 >>
> 10.139.4.85:6928/185705 conn(0x7feab70f6000 :-1 s=STATE_OPEN pgs=108 cs=1
> l=0).*fault initiating reconnect*
>
> 6. Suddenly the memory decreased from 275GB to 147GB.
>
> So, what is the impact ceph-mgr creating here ?
>
>
> Thanks
>
>
> On Tue, Mar 28, 2017 at 2:49 PM, Jay Linux <jaylinuxgeek@xxxxxxxxx> wrote:
>>
>> Hello,
>>
>> We still facing same memory leak issue even if we specify
>> bluestore_cache_size to 100M which caused ceph OSD process killed by out of
>> memory .
>>
>> Mar 27 01:57:05 cn1 kernel: ceph-osd invoked oom-killer: gfp_mask=0x280da,
>> order=0, oom_score_adj=0
>> Mar 27 01:57:05 cn1 kernel: ceph-osd cpuset=/ mems_allowed=0-1
>> Mar 27 01:57:05 cn1 kernel: CPU: 0 PID: 422861 Comm: ceph-osd Not tainted
>> 3.10.0-327.el7.x86_64 #1
>> Mar 27 01:57:05 cn1 kernel: Hardware name: HP ProLiant XL450 Gen9
>> Server/ProLiant XL450 Gen9 Server, BIOS U21 09/12/2016
>> Mar 27 01:57:05 cn1 kernel: ffff884546751700 00000000275e2e50
>> ffff88454137b6f0 ffffffff816351f1
>> Mar 27 01:57:05 cn1 kernel: ffff88454137b780 ffffffff81630191
>> 000000000487ffff ffff8846f0665590
>> Mar 27 01:57:05 cn1 kernel: ffff8845411b3ad8 ffff88454137b7d0
>> ffffffffffffffd5 0000000000000001
>> Mar 27 01:57:05 cn1 kernel: Call Trace:
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116cdee>]
>> oom_kill_process+0x24e/0x3b0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116c956>] ?
>> find_lock_task_mm+0x56/0xc0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff811737f5>]
>> __alloc_pages_nodemask+0xa95/0xb90
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff811b78ca>]
>> alloc_pages_vma+0x9a/0x140
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81197655>]
>> handle_mm_fault+0xb85/0xf50
>> Mar 27 01:57:05 cn1 kernel: [<ffffffffa04f5b22>] ?
>> xfs_perag_get_tag+0x42/0xe0 [xfs]
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81640e22>]
>> __do_page_fault+0x152/0x420
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff813000c9>] ?
>> copy_user_enhanced_fast_string+0x9/0x20
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8130600a>] ?
>> memcpy_toiovec+0x4a/0x90
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8151f91f>]
>> skb_copy_datagram_iovec+0x12f/0x2a0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81574418>] tcp_recvmsg+0x248/0xbc0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff810bb685>] ?
>> sched_clock_cpu+0x85/0xc0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff815a10eb>] inet_recvmsg+0x7b/0xa0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150ffb6>]
>> sock_aio_read.part.7+0x146/0x160
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff811ddcdd>] do_sync_read+0x8d/0xd0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff811de4e5>] vfs_read+0x145/0x170
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
>> Mar 27 01:57:05 cn1 kernel: [<ffffffff81645909>]
>> system_call_fastpath+0x16/0x1b
>>
>> For several occurance of OOM event
>>
>> #dmesg -T | grep -i memory
>> [Mon Mar 27 02:51:25 2017] Out of memory: Kill process 459076 (ceph-osd)
>> score 18 or sacrifice child
>> [Mon Mar 27 06:41:16 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>> [Mon Mar 27 06:41:16 2017] Out of memory: Kill process 976901 (java) score
>> 31 or sacrifice child
>> [Mon Mar 27 06:43:55 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>> [Mon Mar 27 06:43:55 2017] Out of memory: Kill process 37351 (java) score
>> 31 or sacrifice child
>> [Mon Mar 27 06:43:55 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>> [Mon Mar 27 06:43:55 2017] Out of memory: Kill process 435981 (ceph-osd)
>> score 17 or sacrifice child
>> [Mon Mar 27 11:06:07 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>>
>>
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
>> node 0 size: 294786 MB
>> node 0 free: 3447 MB  ===>>> Used almost 98%
>>
>>
>> While analysing numastat results, here you can find each osd consumes more
>> than 5G.
>>
>> ====
>> # numastat -s ceph
>>
>> Per-node process memory usage (in MBs)
>> PID                         Node 0          Node 1           Total
>> -----------------  --------------- --------------- ---------------
>> 372602 (ceph-osd)          5418.34            2.84         5421.18
>> 491602 (ceph-osd)          5351.95            2.83         5354.78
>> 417717 (ceph-osd)          5175.98            2.83         5178.81
>> 273980 (ceph-osd)          5167.83            2.82         5170.65
>> 311956 (ceph-osd)          5167.04            2.84         5169.88
>> 440537 (ceph-osd)          5161.57            2.84         5164.41
>> 368422 (ceph-osd)          5157.87            2.83         5160.70
>> 292227 (ceph-osd)          5156.42            2.83         5159.25
>> ====
>>
>> Is there any way to fix the memory leak? Awaiting for your comments.
>>
>> ---
>> bluestore_cache_size = 107374182
>> bluefs_buffered_io = true
>> ---
>>
>> Env:- RHEL7.2
>>          v11.2.0 kraken , EC 4+1
>>
>> FYI - http://tracker.ceph.com/issues/18924 Raised already a tracker for
>> this issue.
>>
>> Thanks
>>
>>
>> On Mon, Feb 20, 2017 at 11:18 AM, Jay Linux <jaylinuxgeek@xxxxxxxxx>
>> wrote:
>>>
>>> Hello Shinobu,
>>>
>>> We already raised ticket for this issue. FYI -
>>> http://tracker.ceph.com/issues/18924
>>>
>>> Thanks
>>> Jayaram
>>>
>>>
>>> On Mon, Feb 20, 2017 at 12:36 AM, Shinobu Kinjo <skinjo@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> Please open ticket at http://tracker.ceph.com, if you haven't yet.
>>>>
>>>> On Thu, Feb 16, 2017 at 6:07 PM, Muthusamy Muthiah
>>>> <muthiah.muthusamy@xxxxxxxxx> wrote:
>>>> > Hi Wido,
>>>> >
>>>> > Thanks for the information and let us know if this is a bug.
>>>> > As workaround we will go with small bluestore_cache_size to 100MB.
>>>> >
>>>> > Thanks,
>>>> > Muthu
>>>> >
>>>> > On 16 February 2017 at 14:04, Wido den Hollander <wido@xxxxxxxx>
>>>> > wrote:
>>>> >>
>>>> >>
>>>> >> > Op 16 februari 2017 om 7:19 schreef Muthusamy Muthiah
>>>> >> > <muthiah.muthusamy@xxxxxxxxx>:
>>>> >> >
>>>> >> >
>>>> >> > Thanks IIya Letkowski for the information we will change this value
>>>> >> > accordingly.
>>>> >> >
>>>> >>
>>>> >> What I understand from yesterday's performance meeting is that this
>>>> >> seems
>>>> >> like a bug. Lowering this buffer reduces memory, but the root-cause
>>>> >> seems to
>>>> >> be memory not being freed. A few bytes of a larger allocation still
>>>> >> allocated causing this buffer not to be freed.
>>>> >>
>>>> >> Tried:
>>>> >>
>>>> >> debug_mempools = true
>>>> >>
>>>> >> $ ceph daemon osd.X dump_mempools
>>>> >>
>>>> >> Might want to view the YouTube video of yesterday when it's online:
>>>> >> https://www.youtube.com/channel/UCno-Fry25FJ7B4RycCxOtfw/videos
>>>> >>
>>>> >> Wido
>>>> >>
>>>> >> > Thanks,
>>>> >> > Muthu
>>>> >> >
>>>> >> > On 15 February 2017 at 17:03, Ilya Letkowski
>>>> >> > <mj12.svetzari@xxxxxxxxx>
>>>> >> > wrote:
>>>> >> >
>>>> >> > > Hi, Muthusamy Muthiah
>>>> >> > >
>>>> >> > > I'm not totally sure that this is a memory leak.
>>>> >> > > We had same problems with bluestore on ceph v11.2.0.
>>>> >> > > Reduce bluestore cache helped us to solve it and stabilize OSD
>>>> >> > > memory
>>>> >> > > consumption on the 3GB level.
>>>> >> > >
>>>> >> > > Perhaps this will help you:
>>>> >> > >
>>>> >> > > bluestore_cache_size = 104857600
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > On Tue, Feb 14, 2017 at 11:52 AM, Muthusamy Muthiah <
>>>> >> > > muthiah.muthusamy@xxxxxxxxx> wrote:
>>>> >> > >
>>>> >> > >> Hi All,
>>>> >> > >>
>>>> >> > >> On all our 5 node cluster with ceph 11.2.0 we encounter memory
>>>> >> > >> leak
>>>> >> > >> issues.
>>>> >> > >>
>>>> >> > >> Cluster details : 5 node with 24/68 disk per node , EC : 4+1 ,
>>>> >> > >> RHEL
>>>> >> > >> 7.2
>>>> >> > >>
>>>> >> > >> Some traces using sar are below and attached the memory
>>>> >> > >> utilisation
>>>> >> > >> graph
>>>> >> > >> .
>>>> >> > >>
>>>> >> > >> (16:54:42)[cn2.c1 sa] # sar -r
>>>> >> > >> 07:50:01 kbmemfree kbmemused %memused kbbuffers kbcached
>>>> >> > >> kbcommit
>>>> >> > >> %commit
>>>> >> > >> kbactive kbinact kbdirty
>>>> >> > >> 10:20:01 32077264 132754368 80.54 16176 3040244 77767024 47.18
>>>> >> > >> 51991692
>>>> >> > >> 2676468 260
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> *10:30:01 32208384 132623248 80.46 16176 3048536 77832312 47.22
>>>> >> > >> 51851512
>>>> >> > >> 2684552 1210:40:01 32067244 132764388 80.55 16176 3059076
>>>> >> > >> 77832316
>>>> >> > >> 47.22
>>>> >> > >> 51983332 2694708 26410:50:01 30626144 134205488 81.42 16176
>>>> >> > >> 3064340
>>>> >> > >> 78177232 47.43 53414144 2693712 411:00:01 28927656 135903976
>>>> >> > >> 82.45
>>>> >> > >> 16176
>>>> >> > >> 3074064 78958568 47.90 55114284 2702892 1211:10:01 27158548
>>>> >> > >> 137673084
>>>> >> > >> 83.52
>>>> >> > >> 16176 3080600 80553936 48.87 56873664 2708904 1211:20:01
>>>> >> > >> 26455556
>>>> >> > >> 138376076
>>>> >> > >> 83.95 16176 3080436 81991036 49.74 57570280 2708500 811:30:01
>>>> >> > >> 26002252
>>>> >> > >> 138829380 84.22 16176 3090556 82223840 49.88 58015048 2718036
>>>> >> > >> 1611:40:01
>>>> >> > >> 25965924 138865708 84.25 16176 3089708 83734584 50.80 58049980
>>>> >> > >> 2716740
>>>> >> > >> 1211:50:01 26142888 138688744 84.14 16176 3089544 83800100 50.84
>>>> >> > >> 57869628
>>>> >> > >> 2715400 16*
>>>> >> > >>
>>>> >> > >> ...
>>>> >> > >> ...
>>>> >> > >>
>>>> >> > >> In the attached graph, there is increase in memory utilisation
>>>> >> > >> by
>>>> >> > >> ceph-osd during soak test. And when it reaches the system limit
>>>> >> > >> of
>>>> >> > >> 128GB
>>>> >> > >> RAM , we could able to see the below dmesg logs related to
>>>> >> > >> memory out
>>>> >> > >> when
>>>> >> > >> the system reaches close to 128GB RAM. OSD.3 killed due to Out
>>>> >> > >> of
>>>> >> > >> memory
>>>> >> > >> and started again.
>>>> >> > >>
>>>> >> > >> [Tue Feb 14 03:51:02 2017] *tp_osd_tp invoked oom-killer:
>>>> >> > >> gfp_mask=0x280da, order=0, oom_score_adj=0*
>>>> >> > >> [Tue Feb 14 03:51:02 2017] tp_osd_tp cpuset=/ mems_allowed=0-1
>>>> >> > >> [Tue Feb 14 03:51:02 2017] CPU: 20 PID: 11864 Comm: tp_osd_tp
>>>> >> > >> Not
>>>> >> > >> tainted
>>>> >> > >> 3.10.0-327.13.1.el7.x86_64 #1
>>>> >> > >> [Tue Feb 14 03:51:02 2017] Hardware name: HP ProLiant XL420
>>>> >> > >> Gen9/ProLiant
>>>> >> > >> XL420 Gen9, BIOS U19 09/12/2016
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff8819ccd7a280 0000000030e84036
>>>> >> > >> ffff881fa58f7528 ffffffff816356f4
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff881fa58f75b8 ffffffff8163068f
>>>> >> > >> ffff881fa3478360 ffff881fa3478378
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff881fa58f75e8 ffff8819ccd7a280
>>>> >> > >> 0000000000000001 000000000001f65f
>>>> >> > >> [Tue Feb 14 03:51:02 2017] Call Trace:
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff816356f4>]
>>>> >> > >> dump_stack+0x19/0x1b
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8163068f>]
>>>> >> > >> dump_header+0x8e/0x214
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116ce7e>]
>>>> >> > >> oom_kill_process+0x24e/0x3b0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116c9e6>] ?
>>>> >> > >> find_lock_task_mm+0x56/0xc0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116d6a6>]
>>>> >> > >> *out_of_memory+0x4b6/0x4f0*
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81173885>]
>>>> >> > >> __alloc_pages_nodemask+0xa95/0xb90
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811b792a>]
>>>> >> > >> alloc_pages_vma+0x9a/0x140
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811976c5>]
>>>> >> > >> handle_mm_fault+0xb85/0xf50
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811957fb>] ?
>>>> >> > >> follow_page_mask+0xbb/0x5c0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81197c2b>]
>>>> >> > >> __get_user_pages+0x19b/0x640
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8119843d>]
>>>> >> > >> get_user_pages_unlocked+0x15d/0x1f0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8106544f>]
>>>> >> > >> get_user_pages_fast+0x9f/0x1a0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121de78>]
>>>> >> > >> do_blockdev_direct_IO+0x1a78/0x2610
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>>> >> > >> I_BDEV+0x10/0x10
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121ea65>]
>>>> >> > >> __blockdev_direct_IO+0x55/0x60
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>>> >> > >> I_BDEV+0x10/0x10
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81219297>]
>>>> >> > >> blkdev_direct_IO+0x57/0x60
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>>> >> > >> I_BDEV+0x10/0x10
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116af63>]
>>>> >> > >> generic_file_aio_read+0x6d3/0x750
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffffa038ad5c>] ?
>>>> >> > >> xfs_iunlock+0x11c/0x130 [xfs]
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811690db>] ?
>>>> >> > >> unlock_page+0x2b/0x30
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81192f21>] ?
>>>> >> > >> __do_fault+0x401/0x510
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121970c>]
>>>> >> > >> blkdev_aio_read+0x4c/0x70
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811ddcfd>]
>>>> >> > >> do_sync_read+0x8d/0xd0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811de45c>]
>>>> >> > >> vfs_read+0x9c/0x170
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811df182>]
>>>> >> > >> SyS_pread64+0x92/0xc0
>>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81645e89>]
>>>> >> > >> system_call_fastpath+0x16/0x1b
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> Feb 14 03:51:40 fr-paris kernel: *Out of memory: Kill process
>>>> >> > >> 7657
>>>> >> > >> (ceph-osd) score 45 or sacrifice child*
>>>> >> > >> Feb 14 03:51:40 fr-paris kernel: Killed process 7657 (ceph-osd)
>>>> >> > >> total-vm:8650208kB, anon-rss:6124660kB, file-rss:1560kB
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd:* ceph-osd@3.service: main
>>>> >> > >> process
>>>> >> > >> exited, code=killed, status=9/KILL*
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd: Unit ceph-osd@3.service
>>>> >> > >> entered
>>>> >> > >> failed
>>>> >> > >> state.
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd: *ceph-osd@3.service failed.*
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd: cassandra.service: main
>>>> >> > >> process
>>>> >> > >> exited,
>>>> >> > >> code=killed, status=9/KILL
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd: Unit cassandra.service entered
>>>> >> > >> failed
>>>> >> > >> state.
>>>> >> > >> Feb 14 03:51:41 fr-paris systemd: cassandra.service failed.
>>>> >> > >> Feb 14 03:51:41 fr-paris ceph-mgr: 2017-02-14 03:51:41.978878
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch osd_map(7517..7517 src has
>>>> >> > >> 6951..7517) v3
>>>> >> > >> Feb 14 03:51:42 fr-paris systemd: Device
>>>> >> > >> dev-disk-by\x2dpartlabel-ceph\x5cx20block.device
>>>> >> > >> appeared twice with different sysfs paths
>>>> >> > >> /sys/devices/pci0000:00/0000:0
>>>> >> > >> 0:03.2/0000:03:00.0/host0/target0:0:0/0:0:0:9/block/sdj/sdj2 and
>>>> >> > >> /sys/devices/pci0000:00/0000:00:03.2/0000:03:00.0/host0/targ
>>>> >> > >> et0:0:0/0:0:0:4/block/sde/sde2
>>>> >> > >> Feb 14 03:51:42 fr-paris ceph-mgr: 2017-02-14 03:51:42.992477
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch osd_map(7518..7518 src has
>>>> >> > >> 6951..7518) v3
>>>> >> > >> Feb 14 03:51:43 fr-paris ceph-mgr: 2017-02-14 03:51:43.508990
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>>> >> > >> Feb 14 03:51:48 fr-paris ceph-mgr: 2017-02-14 03:51:48.508970
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>>> >> > >> Feb 14 03:51:53 fr-paris ceph-mgr: 2017-02-14 03:51:53.509592
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>>> >> > >> Feb 14 03:51:58 fr-paris ceph-mgr: 2017-02-14 03:51:58.509936
>>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>>> >> > >> Feb 14 03:52:01 fr-paris systemd: ceph-osd@3.service holdoff
>>>> >> > >> time
>>>> >> > >> over,
>>>> >> > >> scheduling restart.
>>>> >> > >> Feb 14 03:52:02 fr-paris systemd: *Starting Ceph object storage
>>>> >> > >> daemon
>>>> >> > >> osd.3.*..
>>>> >> > >> Feb 14 03:52:02 fr-paris systemd: Started Ceph object storage
>>>> >> > >> daemon
>>>> >> > >> osd.3.
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.307106
>>>> >> > >> 7f1e499bb940
>>>> >> > >> -1 WARNING: the following dangerous and experimental features
>>>> >> > >> are
>>>> >> > >> enabled:
>>>> >> > >> bluestore,rocksdb
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.317687
>>>> >> > >> 7f1e499bb940
>>>> >> > >> -1 WARNING: the following dangerous and experimental features
>>>> >> > >> are
>>>> >> > >> enabled:
>>>> >> > >> bluestore,rocksdb
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: starting osd.3 at - osd_data
>>>> >> > >> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.333522
>>>> >> > >> 7f1e499bb940
>>>> >> > >> -1 WARNING: experimental feature 'bluestore' is enabled
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: Please be aware that this
>>>> >> > >> feature
>>>> >> > >> is
>>>> >> > >> experimental, untested,
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: unsupported, and may result in
>>>> >> > >> data
>>>> >> > >> corruption, data loss,
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: and/or irreparable damage to
>>>> >> > >> your
>>>> >> > >> cluster.  Do not use
>>>> >> > >> Feb 14 03:52:02 fr-paris numactl: feature with important data.
>>>> >> > >>
>>>> >> > >> This seems to happen only in 11.2.0 and not in 11.1.x . Could
>>>> >> > >> you
>>>> >> > >> please
>>>> >> > >> help us in resolving this issue by means of any config change to
>>>> >> > >> limit the
>>>> >> > >> memory use on ceph-osd or a bug in the current kraken release.
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >> Muthu
>>>> >> > >>
>>>> >> > >> _______________________________________________
>>>> >> > >> ceph-users mailing list
>>>> >> > >> ceph-users@xxxxxxxxxxxxxx
>>>> >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >> > >>
>>>> >> > >>
>>>> >> > >
>>>> >> > >
>>>> >> > > --
>>>> >> > > С уважением / Best regards
>>>> >> > >
>>>> >> > > Илья Летковский / Ilya Letkouski
>>>> >> > >
>>>> >> > > Phone, Viber: +375 29 3237335
>>>> >> > >
>>>> >> > > Minsk, Belarus (GMT+3)
>>>> >> > >
>>>> >> > _______________________________________________
>>>> >> > ceph-users mailing list
>>>> >> > ceph-users@xxxxxxxxxxxxxx
>>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com