Re: [Bug 207273] New: cgroup with 1.5GB limit and 100MB rss usage OOM-kills processes due to page cache usage after upgrading to kernel 5.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 15 Apr 2020 01:32:12 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=207273
> 
>             Bug ID: 207273
>            Summary: cgroup with 1.5GB limit and 100MB rss usage OOM-kills
>                     processes due to page cache usage after upgrading to
>                     kernel 5.4
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.4.20
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: paulfurtado91@xxxxxxxxx
>         Regression: No
> 
> Upon upgrading to kernel 5.4, we see constant OOM kills in database containers
> that are restoring from backups, with nearly no RSS memory usage. It appears
> all the memory is consumed by file_dirty, with applications using minimal
> memory. On kernel 4.14.146 and 4.19.75, we do not see this problem, so it
> appears to be a new regression.

Thanks.

That's an elderly kernel.  Are you in a position to determine whether
contemporary kernel behave similarly?

> The full OOM log from dmesg shows:
> 
> xtrabackup invoked oom-killer:
> gfp_mask=0x101c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE),
> order=0, oom_score_adj=993
> CPU: 9 PID: 50206 Comm: xtrabackup Tainted: G            E    
> 5.4.20-hs779.el6.x86_64 #1
> Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
> Call Trace:
>  dump_stack+0x66/0x8b
>  dump_header+0x4a/0x200
>  oom_kill_process+0xd7/0x110
>  out_of_memory+0x105/0x510
>  mem_cgroup_out_of_memory+0xb5/0xd0
>  try_charge+0x7b1/0x7f0
>  mem_cgroup_try_charge+0x70/0x190
>  __add_to_page_cache_locked+0x2b6/0x2f0
>  ? scan_shadow_nodes+0x30/0x30
>  add_to_page_cache_lru+0x4a/0xc0
>  pagecache_get_page+0xf5/0x210
>  grab_cache_page_write_begin+0x1f/0x40
>  iomap_write_begin.constprop.33+0x1ee/0x320
>  ? iomap_write_end+0x91/0x240
>  iomap_write_actor+0x92/0x170
>  ? iomap_dirty_actor+0x1b0/0x1b0
>  iomap_apply+0xba/0x130
>  ? iomap_dirty_actor+0x1b0/0x1b0
>  iomap_file_buffered_write+0x62/0x90
>  ? iomap_dirty_actor+0x1b0/0x1b0
>  xfs_file_buffered_aio_write+0xca/0x310 [xfs]
>  new_sync_write+0x11b/0x1b0
>  vfs_write+0xad/0x1a0
>  ksys_pwrite64+0x71/0x90
>  do_syscall_64+0x4e/0x100
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f6085b181a3
> Code: 49 89 ca b8 12 00 00 00 0f 05 48 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
> 8b f0 ff ff 48 89 04 24 49 89 ca b8 12 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8
> d1 f0 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:00007ffd43632320 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
> RAX: ffffffffffffffda RBX: 00007ffd43632400 RCX: 00007f6085b181a3
> RDX: 0000000000100000 RSI: 0000000004a54000 RDI: 0000000000000004
> RBP: 00007ffd43632590 R08: 0000000066e00000 R09: 00007ffd436325c0
> R10: 0000000066e00000 R11: 0000000000000293 R12: 0000000000100000
> R13: 0000000066e00000 R14: 0000000066e00000 R15: 0000000001acdd20
> memory: usage 1536000kB, limit 1536000kB, failcnt 0
> memory+swap: usage 1536000kB, limit 1536000kB, failcnt 490221
> kmem: usage 23164kB, limit 9007199254740988kB, failcnt 0
> Memory cgroup stats for
> /kubepods/burstable/pod6900693c-8b2c-4efe-ab52-26e4a6bd9e4c/83216944bb43baf32f0d43ef12c85ebaa2767b3f51846f5fa438bba00b4636d8:
> anon 72507392
> file 1474740224
> kernel_stack 774144
> slab 18673664
> sock 0
> shmem 0
> file_mapped 0
> file_dirty 1413857280
> file_writeback 60555264
> anon_thp 0
> inactive_anon 0
> active_anon 72585216
> inactive_file 368873472
> active_file 1106067456
> unevictable 0
> slab_reclaimable 11403264
> slab_unreclaimable 7270400
> pgfault 34848
> pgmajfault 0
> workingset_refault 0
> workingset_activate 0
> workingset_nodereclaim 0
> pgrefill 17089962
> pgscan 18425256
> pgsteal 602912
> pgactivate 17822046
> pgdeactivate 17089962
> pglazyfree 0
> pglazyfreed 0
> thp_fault_alloc 0
> thp_collapse_alloc 0
> Tasks state (memory values in pages):
> [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj
> name
> [  42046]   500 42046      257        1    32768        0           993 init
> [  43157]   500 43157   164204    18473   335872        0           993
> vttablet
> [  50206]   500 50206   294931     8856   360448        0           993
> xtrabackup
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=83216944bb43baf32f0d43ef12c85ebaa2767b3f51846f5fa438bba00b4636d8,mems_allowed=0,oom_memcg=/kubepods/burstable/pod6900693c-8b2c-4efe-ab52-26e4a6bd9e4c/83216944bb43baf32f0d43ef12c85ebaa2767b3f51846f5fa438bba00b4636d8,task_memcg=/kubepods/burstable/pod6900693c-8b2c-4efe-ab52-26e4a6bd9e4c/83216944bb43baf32f0d43ef12c85ebaa2767b3f51846f5fa438bba00b4636d8,task=vttablet,pid=43157,uid=500
> Memory cgroup out of memory: Killed process 43157 (vttablet) total-vm:656816kB,
> anon-rss:50572kB, file-rss:23320kB, shmem-rss:0kB, UID:500 pgtables:328kB
> oom_score_adj:993
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux