Re: Memory leak in Ceph OSD?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With default memory settings, the assumed memory requirements of Ceph are 1GB RAM/1TB of OSD size.  Increasing any settings from default will increase that baseline.

On Tue, Mar 27, 2018 at 1:10 AM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
On Mon, Mar 26, 2018 at 3:08 PM, Igor Fedotov <ifedotov@xxxxxxx> wrote:
> Hi Alex,
>
> I can see your bug report: https://tracker.ceph.com/issues/23462
>
> if your settings from there are applicable for your comment here then you
> have bluestore cache size limit set to 5 Gb that totals in 90 Gb RAM for  18
> OSD for BlueStore cache only.
>
> There is also additional memory overhead per OSD hence the amount of free
> memory you should expect isn't that much. If any at all...
>
> Can you reduce bluestore cache size limits and check if out-of-memory  issue
> is still happening?
>

Thank you Igor, reducing to 3GB now and will advise.  I did not
realize there's additional memory on top of the 90GB, the nodes each
have 128 GB.


--
Alex Gorbachev
Storcium

>
> Thanks,
>
> Igor
>
>
>
> On 3/26/2018 5:09 PM, Alex Gorbachev wrote:
>>
>> On Wed, Mar 21, 2018 at 2:26 PM, Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
>> wrote:
>>>
>>> I retract my previous statement(s).
>>>
>>> My current suspicion is that this isn't a leak as much as it being
>>> load-driven, after enough waiting - it generally seems to settle around
>>> some
>>> equilibrium. We do seem to sit on the mempools x 2.4 ~ ceph-osd RSS,
>>> which
>>> is on the higher side (I see documentation alluding to expecting ~1.5x).
>>>
>>> -KJ
>>>
>>> On Mon, Mar 19, 2018 at 3:05 AM, Konstantin Shalygin <k0ste@xxxxxxxx>
>>> wrote:
>>>>
>>>>
>>>>> We don't run compression as far as I know, so that wouldn't be it. We
>>>>> do
>>>>> actually run a mix of bluestore & filestore - due to the rest of the
>>>>> cluster predating a stable bluestore by some amount.
>>>>
>>>>
>>>>
>>>> 12.2.2 -> 12.2.4 at 2018/03/10: I don't see increase of memory usage. No
>>>> any compressions of course.
>>>>
>>>>
>>>>
>>>>
>>>> http://storage6.static.itmages.com/i/18/0319/h_1521453809_9131482_859b1fb0a5.png
>>>>
>> I am seeing these entries under load - should be plenty of RAM on a
>> node with 128GB RAM and 18 OSDs
>>
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193331] winbindd
>> cpuset=/ mems_allowed=0-1
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193337] CPU: 3 PID:
>> 3406 Comm: winbindd Not tainted 4.14.14-041414-generic #201801201219
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193338] Hardware name:
>> Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2
>> 03/04/2015
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193339] Call Trace:
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193347]
>> dump_stack+0x5c/0x85
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193351]
>> dump_header+0x94/0x229
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193355]  ?
>> do_try_to_free_pages+0x2a1/0x330
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193357]  ?
>> get_page_from_freelist+0xa3/0xb20
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193359]
>> oom_kill_process+0x213/0x410
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193361]
>> out_of_memory+0x2af/0x4d0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193363]
>> __alloc_pages_slowpath+0xab2/0xe40
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193366]
>> __alloc_pages_nodemask+0x261/0x280
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193370]
>> filemap_fault+0x33f/0x6b0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193373]  ?
>> filemap_map_pages+0x18a/0x3a0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193376]
>> ext4_filemap_fault+0x2c/0x40
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193379]
>> __do_fault+0x19/0xe0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193381]
>> __handle_mm_fault+0xcd6/0x1180
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193383]
>> handle_mm_fault+0xaa/0x1f0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193387]
>> __do_page_fault+0x25d/0x4e0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193391]  ?
>> page_fault+0x36/0x60
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193393]
>> page_fault+0x4c/0x60
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193396] RIP:
>> 0033:0x56443d3d1239
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193397] RSP:
>> 002b:00007ffe6e44b3a0 EFLAGS: 00010246
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193399] Mem-Info:
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
>> active_anon:30843938 inactive_anon:1403277 isolated_anon:0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
>> active_file:121 inactive_file:977 isolated_file:18
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
>> unevictable:3203 dirty:2 writeback:0 unstable:0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]
>> slab_reclaimable:51522 slab_unreclaimable:95924
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]  mapped:2926
>> shmem:5220 pagetables:77204 bounce:0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193407]  free:328371
>> free_pcp:0 free_cma:0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193411] Node 0
>> active_anon:61155956kB inactive_anon:3014752kB active_file:864kB
>> inactive_file:1432kB unevictable:10440kB isolated(anon):0kB
>> isolated(file):80kB mapped:7648kB dirty:0kB writeback:0kB
>> shmem:14460kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>> writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193414] Node 1
>> active_anon:62219796kB inactive_anon:2598356kB active_file:0kB
>> inactive_file:2476kB unevictable:2372kB isolated(anon):0kB
>> isolated(file):0kB mapped:4056kB dirty:8kB writeback:0kB shmem:6420kB
>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>> unstable:0kB all_unreclaimable? no
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193416] Node 0 DMA
>> free:15896kB min:124kB low:152kB high:180kB active_anon:0kB
>> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
>> writepending:0kB present:15980kB managed:15896kB mlocked:0kB
>> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> free_cma:0kB
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193420]
>> lowmem_reserve[]: 0 1889 64319 64319 64319
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193424] Node 0 DMA32
>> free:265308kB min:15732kB low:19664kB high:23596kB
>> active_anon:1642352kB inactive_anon:63060kB active_file:0kB
>> inactive_file:0kB unevictable:0kB writepending:0kB present:2045868kB
>> managed:1980300kB mlocked:0kB kernel_stack:48kB pagetables:832kB
>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193428]
>> lowmem_reserve[]: 0 0 62430 62430 62430
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193432] Node 0 Normal
>> free:507908kB min:507928kB low:634908kB high:761888kB
>> active_anon:59513604kB inactive_anon:2951692kB active_file:732kB
>> inactive_file:1720kB unevictable:10440kB writepending:0kB
>> present:65011712kB managed:63934936kB mlocked:10440kB
>> kernel_stack:16392kB pagetables:164944kB bounce:0kB free_pcp:0kB
>> local_pcp:0kB free_cma:0kB
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193436]
>> lowmem_reserve[]: 0 0 0 0 0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193440] Node 1 Normal
>> free:524372kB min:524784kB low:655980kB high:787176kB
>> active_anon:62219796kB inactive_anon:2598356kB active_file:504kB
>> inactive_file:1392kB unevictable:2372kB writepending:8kB
>> present:67108864kB managed:66056740kB mlocked:2372kB
>> kernel_stack:17912kB pagetables:143040kB bounce:0kB free_pcp:0kB
>> local_pcp:0kB free_cma:0kB
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193444]
>> lowmem_reserve[]: 0 0 0 0 0
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193447] Node 0 DMA:
>> 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U)
>> 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
>> Mar 26 07:55:32 roc04r-sc3a085 kernel: [733474.193459] Node 0 DMA32:
>> 403*4kB (UME) 238*8kB (UME) 196*16kB (UME) 102*32kB (UME) 56*64kB
>> (UME) 24*128kB (UE) 25*256kB (UM) 11*512kB (UME) 4*1024kB (UE)
>> 6*2048kB (UM) 54*4096kB (UM) = 266172kB
>>
>>
>>>>
>>>>
>>>> k
>>>
>>>
>>>
>>>
>>> --
>>> Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
>>> SRE, Medallia Inc
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux