Re: kraken-bluestore 11.2.0 memory leak issue

Ilya Letkouski <mail@xxxxxxx> · Thu, 16 Feb 2017 00:03:43 +0300

Hi, Muthusamy Muthiah

I'm not totally sure that this is a memory leak.
We had same problems with bluestore on ceph v11.2.0. 
Reduce bluestore cache helped us to solve it and stabilize OSD memory consumption on the 3GB level.

Perhaps this will help you:

bluestore_cache_size = 104857600

On Wed, Feb 15, 2017 at 2:33 PM, Ilya Letkowski <mj12.svetzari@xxxxxxxxx> wrote:
Hi, Muthusamy Muthiah

I'm not totally sure that this is a memory leak.
We had same problems with bluestore on ceph v11.2.0. 
Reduce bluestore cache helped us to solve it and stabilize OSD memory consumption on the 3GB level.

Perhaps this will help you:

bluestore_cache_size = 104857600

On Tue, Feb 14, 2017 at 11:52 AM, Muthusamy Muthiah <muthiah.muthusamy@xxxxxxxxx> wrote:
Hi All,
On all our 5 node cluster with ceph 11.2.0 we encounter memory leak issues.

Cluster details : 5 node with 24/68 disk per node , EC : 4+1 , RHEL 7.2

Some traces using sar are below and attached the memory utilisation graph .

(16:54:42)[cn2.c1 sa] # sar -r
07:50:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
10:20:01 32077264 132754368 80.54 16176 3040244 77767024 47.18 51991692 2676468 260
10:30:01 32208384 132623248 80.46 16176 3048536 77832312 47.22 51851512 2684552 12
10:40:01 32067244 132764388 80.55 16176 3059076 77832316 47.22 51983332 2694708 264
10:50:01 30626144 134205488 81.42 16176 3064340 78177232 47.43 53414144 2693712 4
11:00:01 28927656 135903976 82.45 16176 3074064 78958568 47.90 55114284 2702892 12
11:10:01 27158548 137673084 83.52 16176 3080600 80553936 48.87 56873664 2708904 12
11:20:01 26455556 138376076 83.95 16176 3080436 81991036 49.74 57570280 2708500 8
11:30:01 26002252 138829380 84.22 16176 3090556 82223840 49.88 58015048 2718036 16
11:40:01 25965924 138865708 84.25 16176 3089708 83734584 50.80 58049980 2716740 12
11:50:01 26142888 138688744 84.14 16176 3089544 83800100 50.84 57869628 2715400 16

...
...

In the attached graph, there is increase in memory utilisation by ceph-osd during soak test. And when it reaches the system limit of 128GB RAM , we could able to see the below dmesg logs related to memory out when the system reaches close to 128GB RAM. OSD.3 killed due to Out of memory and started again.

[Tue Feb 14 03:51:02 2017] tp_osd_tp invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Tue Feb 14 03:51:02 2017] tp_osd_tp cpuset=/ mems_allowed=0-1
[Tue Feb 14 03:51:02 2017] CPU: 20 PID: 11864 Comm: tp_osd_tp Not tainted 3.10.0-327.13.1.el7.x86_64 #1
[Tue Feb 14 03:51:02 2017] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 09/12/2016
[Tue Feb 14 03:51:02 2017]  ffff8819ccd7a280 0000000030e84036 ffff881fa58f7528 ffffffff816356f4
[Tue Feb 14 03:51:02 2017]  ffff881fa58f75b8 ffffffff8163068f ffff881fa3478360 ffff881fa3478378
[Tue Feb 14 03:51:02 2017]  ffff881fa58f75e8 ffff8819ccd7a280 0000000000000001 000000000001f65f
[Tue Feb 14 03:51:02 2017] Call Trace:
[Tue Feb 14 03:51:02 2017]  [<ffffffff816356f4>] dump_stack+0x19/0x1b
[Tue Feb 14 03:51:02 2017]  [<ffffffff8163068f>] dump_header+0x8e/0x214
[Tue Feb 14 03:51:02 2017]  [<ffffffff8116ce7e>] oom_kill_process+0x24e/0x3b0
[Tue Feb 14 03:51:02 2017]  [<ffffffff8116c9e6>] ? find_lock_task_mm+0x56/0xc0
[Tue Feb 14 03:51:02 2017]  [<ffffffff8116d6a6>] out_of_memory+0x4b6/0x4f0
[Tue Feb 14 03:51:02 2017]  [<ffffffff81173885>] __alloc_pages_nodemask+0xa95/0xb90
[Tue Feb 14 03:51:02 2017]  [<ffffffff811b792a>] alloc_pages_vma+0x9a/0x140
[Tue Feb 14 03:51:02 2017]  [<ffffffff811976c5>] handle_mm_fault+0xb85/0xf50
[Tue Feb 14 03:51:02 2017]  [<ffffffff811957fb>] ? follow_page_mask+0xbb/0x5c0
[Tue Feb 14 03:51:02 2017]  [<ffffffff81197c2b>] __get_user_pages+0x19b/0x640
[Tue Feb 14 03:51:02 2017]  [<ffffffff8119843d>] get_user_pages_unlocked+0x15d/0x1f0
[Tue Feb 14 03:51:02 2017]  [<ffffffff8106544f>] get_user_pages_fast+0x9f/0x1a0
[Tue Feb 14 03:51:02 2017]  [<ffffffff8121de78>] do_blockdev_direct_IO+0x1a78/0x2610
[Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ? I_BDEV+0x10/0x10
[Tue Feb 14 03:51:02 2017]  [<ffffffff8121ea65>] __blockdev_direct_IO+0x55/0x60
[Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ? I_BDEV+0x10/0x10
[Tue Feb 14 03:51:02 2017]  [<ffffffff81219297>] blkdev_direct_IO+0x57/0x60
[Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ? I_BDEV+0x10/0x10
[Tue Feb 14 03:51:02 2017]  [<ffffffff8116af63>] generic_file_aio_read+0x6d3/0x750
[Tue Feb 14 03:51:02 2017]  [<ffffffffa038ad5c>] ? xfs_iunlock+0x11c/0x130 [xfs]
[Tue Feb 14 03:51:02 2017]  [<ffffffff811690db>] ? unlock_page+0x2b/0x30
[Tue Feb 14 03:51:02 2017]  [<ffffffff81192f21>] ? __do_fault+0x401/0x510
[Tue Feb 14 03:51:02 2017]  [<ffffffff8121970c>] blkdev_aio_read+0x4c/0x70
[Tue Feb 14 03:51:02 2017]  [<ffffffff811ddcfd>] do_sync_read+0x8d/0xd0
[Tue Feb 14 03:51:02 2017]  [<ffffffff811de45c>] vfs_read+0x9c/0x170
[Tue Feb 14 03:51:02 2017]  [<ffffffff811df182>] SyS_pread64+0x92/0xc0
[Tue Feb 14 03:51:02 2017]  [<ffffffff81645e89>] system_call_fastpath+0x16/0x1b

Feb 14 03:51:40 fr-paris kernel: Out of memory: Kill process 7657 (ceph-osd) score 45 or sacrifice child
Feb 14 03:51:40 fr-paris kernel: Killed process 7657 (ceph-osd) total-vm:8650208kB, anon-rss:6124660kB, file-rss:1560kB
Feb 14 03:51:41 fr-paris systemd: ceph-osd@3.service: main process exited, code=killed, status=9/KILL
Feb 14 03:51:41 fr-paris systemd: Unit ceph-osd@3.service entered failed state.
Feb 14 03:51:41 fr-paris systemd: ceph-osd@3.service failed.
Feb 14 03:51:41 fr-paris systemd: cassandra.service: main process exited, code=killed, status=9/KILL
Feb 14 03:51:41 fr-paris systemd: Unit cassandra.service entered failed state.
Feb 14 03:51:41 fr-paris systemd: cassandra.service failed.
Feb 14 03:51:41 fr-paris ceph-mgr: 2017-02-14 03:51:41.978878 7f51a3154700 -1 mgr ms_dispatch osd_map(7517..7517 src has 6951..7517) v3
Feb 14 03:51:42 fr-paris systemd: Device dev-disk-by\x2dpartlabel-ceph\x5cx20block.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:03:00.0/host0/target0:0:0/0:0:0:9/block/sdj/sdj2 and /sys/devices/pci0000:00/0000:00:03.2/0000:03:00.0/host0/target0:0:0/0:0:0:4/block/sde/sde2
Feb 14 03:51:42 fr-paris ceph-mgr: 2017-02-14 03:51:42.992477 7f51a3154700 -1 mgr ms_dispatch osd_map(7518..7518 src has 6951..7518) v3
Feb 14 03:51:43 fr-paris ceph-mgr: 2017-02-14 03:51:43.508990 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
Feb 14 03:51:48 fr-paris ceph-mgr: 2017-02-14 03:51:48.508970 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
Feb 14 03:51:53 fr-paris ceph-mgr: 2017-02-14 03:51:53.509592 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
Feb 14 03:51:58 fr-paris ceph-mgr: 2017-02-14 03:51:58.509936 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
Feb 14 03:52:01 fr-paris systemd: ceph-osd@3.service holdoff time over, scheduling restart.
Feb 14 03:52:02 fr-paris systemd: Starting Ceph object storage daemon osd.3...
Feb 14 03:52:02 fr-paris systemd: Started Ceph object storage daemon osd.3.
Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.307106 7f1e499bb940 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.317687 7f1e499bb940 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
Feb 14 03:52:02 fr-paris numactl: starting osd.3 at - osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.333522 7f1e499bb940 -1 WARNING: experimental feature 'bluestore' is enabled
Feb 14 03:52:02 fr-paris numactl: Please be aware that this feature is experimental, untested,
Feb 14 03:52:02 fr-paris numactl: unsupported, and may result in data corruption, data loss,
Feb 14 03:52:02 fr-paris numactl: and/or irreparable damage to your cluster.  Do not use
Feb 14 03:52:02 fr-paris numactl: feature with important data.

This seems to happen only in 11.2.0 and not in 11.1.x . Could you please help us in resolving this issue by means of any config change to limit the memory use on ceph-osd or a bug in the current kraken release.

Thanks,
Muthu

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
С уважением / Best regards
Илья Летковский / Ilya Letkouski
Phone, Viber: +375 29 3237335
Minsk, Belarus (GMT+3)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com