Re: How to interpret this OOM situation?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Swap can be used only for anon pages or for tmpfs. You have a lot of
file page cache.
I guess this is leak of pages' reference counter in some filesystem,
more likely in cifs.

Try to isolate which part of workload causes this leak, for example
switch to another filesystem.

On Sun, Nov 16, 2014 at 5:11 PM, Marki <mro2@xxxxxxx> wrote:
>
> Hey there,
>
> I wouldn't know where to turn anymore, maybe you guys can help me debug this
> OOM.
>
> Questions aside from "why in the end is this happening":
> - GFP mask lower byte 0xa indicates a request for a free page in highmem.
> This is a 64-bit system and therefore has no highmem zone. So what's going on?
> - Swap is almost not used: why not use it before OOMing?
> - Pagecache is high: why not empty it before OOMing? (almost no dirty pages)
>
> Oh and it's a machine with 4G of RAM on kernel 3.0.101 (SLES11 SP3).
>
>
> <4>[598175.284914] cifsd invoked oom-killer: gfp_mask=0x200da, order=0,
> oom_adj=0, oom_score_adj=0
> <6>[598175.284919] cifsd cpuset=/ mems_allowed=0
> <4>[598175.284921] Pid: 5529, comm: cifsd Tainted: G           E X
> 3.0.101-0.35-default #1
> <4>[598175.284923] Call Trace:
> <4>[598175.284934]  [<ffffffff81004935>] dump_trace+0x75/0x310
> <4>[598175.284941]  [<ffffffff8145f2f3>] dump_stack+0x69/0x6f
> <4>[598175.284947]  [<ffffffff810fc53e>] dump_header+0x8e/0x110
> <4>[598175.284950]  [<ffffffff810fc8e6>] oom_kill_process+0xa6/0x350
> <4>[598175.284954]  [<ffffffff810fce25>] out_of_memory+0x295/0x2f0
> <4>[598175.284957]  [<ffffffff8110287e>] __alloc_pages_slowpath+0x78e/0x7d0
> <4>[598175.284960]  [<ffffffff81102aa9>] __alloc_pages_nodemask+0x1e9/0x200
> <4>[598175.284965]  [<ffffffff8113de60>] alloc_pages_vma+0xd0/0x1c0
> <4>[598175.284969]  [<ffffffff81130bcd>] read_swap_cache_async+0x10d/0x160
> <4>[598175.284972]  [<ffffffff81130c94>] swapin_readahead+0x74/0xd0
> <4>[598175.284975]  [<ffffffff81120bfa>] do_swap_page+0xea/0x5f0
> <4>[598175.284978]  [<ffffffff81121c21>] handle_pte_fault+0x1e1/0x230
> <4>[598175.284982]  [<ffffffff81465bcd>] do_page_fault+0x1fd/0x4c0
> <4>[598175.284985]  [<ffffffff814627e5>] page_fault+0x25/0x30
> <4>[598175.285002]  [<00007f65a0891078>] 0x7f65a0891077
>
> Ok, it wants to swap in sth but fails because apparently there is no more
> physical memory.
>
> <4>[598175.285003] Mem-Info:
> <4>[598175.285004] Node 0 DMA per-cpu:
> <4>[598175.285006] CPU    0: hi:    0, btch:   1 usd:   0
> <4>[598175.285007] CPU    1: hi:    0, btch:   1 usd:   0
> <4>[598175.285008] Node 0 DMA32 per-cpu:
> <4>[598175.285010] CPU    0: hi:  186, btch:  31 usd:   9
> <4>[598175.285011] CPU    1: hi:  186, btch:  31 usd:   7
> <4>[598175.285012] Node 0 Normal per-cpu:
> <4>[598175.285013] CPU    0: hi:  186, btch:  31 usd:  35
> <4>[598175.285014] CPU    1: hi:  186, btch:  31 usd:  31
> <4>[598175.285017] active_anon:218 inactive_anon:91 isolated_anon:0
> <4>[598175.285018]  active_file:187788 inactive_file:451982 isolated_file:896
> <4>[598175.285018]  unevictable:0 dirty:0 writeback:69 unstable:0
> <4>[598175.285019]  free:21841 slab_reclaimable:8417 slab_unreclaimable:132175
> <4>[598175.285020]  mapped:8168 shmem:4 pagetables:2639 bounce:0
>
> Here we see a little over 3G used although I wouldn't be able to say what
> the different entries are exactly.
>
> <4>[598175.285021] Node 0 DMA free:15880kB min:256kB low:320kB high:384kB
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictable:0kB isolat
> ed(anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0k
> B pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> <4>[598175.285027] lowmem_reserve[]: 0 3000 4010 4010
> <4>[598175.285029] Node 0 DMA32 free:54600kB min:50368kB low:62960kB
> high:75552kB active_anon:860kB inactive_anon:308kB active_file:600716kB
> inactive_file:1576184kB
>  unevictable:0kB isolated(anon):0kB isolated(file):3328kB present:3072160kB
> mlocked:0kB dirty:0kB writeback:248kB mapped:26800kB shmem:16kB
> slab_reclaimable:23552kB
>  slab_unreclaimable:412540kB kernel_stack:752kB pagetables:2412kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4169324
> all_unreclaimable? yes
> <4>[598175.285036] lowmem_reserve[]: 0 0 1010 1010
> <4>[598175.285038] Node 0 Normal free:16884kB min:16956kB low:21192kB
> high:25432kB active_anon:12kB inactive_anon:56kB active_file:150436kB
> inactive_file:231744kB u
> nevictable:0kB isolated(anon):0kB isolated(file):384kB present:1034240kB
> mlocked:0kB dirty:0kB writeback:28kB mapped:5872kB shmem:0kB
> slab_reclaimable:10116kB slab_
> unreclaimable:116160kB kernel_stack:2848kB pagetables:8144kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:688103 all_unreclaimable? yes
> <4>[598175.285044] lowmem_reserve[]: 0 0 0 0
> <4>[598175.285046] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB
> 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
> <4>[598175.285051] Node 0 DMA32: 12620*4kB 3*8kB 0*16kB 0*32kB 0*64kB
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54600kB
> <4>[598175.285056] Node 0 Normal: 3195*4kB 1*8kB 0*16kB 0*32kB 0*64kB
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 16884kB
>
> There seems to be a lot of fragmentation. But since an order 0 page (4k) was
> requested (in highmem!?), and tons of those are available, that wouldn't
> matter, would it?
>
> <4>[598175.285061] 375504 total pagecache pages
>
> That's more than 1G of pagecache. Shouldn't it first lower that cache before
> throwing OOM?
>
> <4>[598175.285062] 268 pages in swap cache
> <4>[598175.285064] Swap cache stats: add 1266107, delete 1265839, find
> 3666696/3838636
> <4>[598175.285065] Free swap  = 4641856kB
> <4>[598175.285066] Total swap = 5244924kB
>
> Almost no swap used. Shouldn't it swap before throwing OOM?
>
> <4>[598175.285066] 1030522 pages RAM
>
> Oh and FWIW here comes the process list
>
> <6>[598175.285067] [ pid ]   uid  tgid total_vm      rss cpu oom_adj
> oom_score_adj name
> <6>[598175.285071] [  485]     0   485     4223       62   0     -17
>  -1000 udevd
> <6>[598175.285073] [ 1434]     0  1434     1003       65   1       0
>      0 acpid
> <6>[598175.285075] [ 1449]   100  1449     8585      112   0       0
>      0 dbus-daemon
> <6>[598175.285077] [ 1475]     0  1475    36450      428   1       0
>      0 mono
> <6>[598175.285079] [ 1772]     0  1772    21365      298   1       0
>      0 vmtoolsd
> <6>[598175.285081] [ 1838]   101  1838    12322      180   0       0
>      0 hald
> <6>[598175.285083] [ 1842]     0  1842    41067      187   1       0
>      0 console-kit-dae
> <6>[598175.285085] [ 1843]     0  1843     4510       56   1       0
>      0 hald-runner
> <6>[598175.285087] [ 1961]     0  1961     8691       17   0       0
>      0 hald-addon-inpu
> <6>[598175.285107] [ 1984]     0  1984     8691       75   1       0
>      0 hald-addon-stor
> <6>[598175.285109] [ 1992]   101  1992     9130        7   1       0
>      0 hald-addon-acpi
> <6>[598175.285111] [ 1993]     0  1993     8691       77   0       0
>      0 hald-addon-stor
> <6>[598175.285113] [ 2562]     0  2562    47184       78   1       0
>      0 httpstkd
> <6>[598175.285115] [ 2581]     0  2581     5881      221   1       0
>      0 syslog-ng
> <6>[598175.285117] [ 2584]     0  2584     1070       63   1       0
>      0 klogd
> <6>[598175.285119] [ 2598]     0  2598    23796      104   1     -17
>  -1000 auditd
> <6>[598175.285121] [ 2600]     0  2600    19995       87   1       0
>      0 audispd
> <6>[598175.285123] [ 2621]     0  2621     2093       58   0       0
>      0 haveged
> <6>[598175.285125] [ 2641]     0  2641     4728       81   1       0
>      0 rpcbind
> <6>[598175.285127] [ 2680]     0  2680    77513      657   0       0
>      0 nsrexecd
> <6>[598175.285129] [ 2753]     0  2753     4222       52   0     -17
>  -1000 udevd
> <6>[598175.285131] [ 2832]     0  2832     2160       75   0       0
>      0 irqbalance
> <6>[598175.285133] [ 2863]     0  2863     6778       53   1       0
>      0 mcelog
> <6>[598175.285135] [ 3163]     0  3163    35027      170   1       0
>      0 gmond
> <6>[598175.285137] [ 3177] 65534  3177    56670      185   0       0
>      0 gmetad
> <6>[598175.285139] [ 3213]     0  3213    24991      107   1       0
>      0 sfcbd
> <6>[598175.285141] [ 3214]     0  3214    16795        0   1       0
>      0 sfcbd
> <6>[598175.285143] [ 3221]     0  3221    20445       78   1       0
>      0 sfcbd
> <6>[598175.285145] [ 3222]     0  3222    41992      117   1       0
>      0 sfcbd
> <6>[598175.285147] [ 3239]     0  3239    16092       58   1       0
>      0 pure-ftpd
> <6>[598175.285149] [ 3240]     2  3240     6284       82   0       0
>      0 slpd
> <6>[598175.285151] [ 3290]     0  3290    12855      120   0     -17
>  -1000 sshd
> <6>[598175.285153] [ 3316]    74  3316     8070      152   0       0
>      0 ntpd
> <6>[598175.285154] [ 3333]     0  3333    17945       90   1       0
>      0 cupsd
> <6>[598175.285156] [ 3393]     0  3393    19365       31   1       0
>      0 sfcbd
> <6>[598175.285158] [ 3395]     0  3395    21475      109   0       0
>      0 sfcbd
> <6>[598175.285160] [ 3400]     0  3400    38331      129   1       0
>      0 sfcbd
> <6>[598175.285162] [ 3479]     0  3479    38357      125   0       0
>      0 sfcbd
> <6>[598175.285164] [ 3719]     0  3655   220311     2005   0       0
>      0 ndsd
> <6>[598175.285166] [ 3893]    30  3893   177915      910   0       0
>      0 java
> <6>[598175.285168] [ 3910]     0  3910    14968       97   1       0
>      0 nscd
> <6>[598175.285170] [ 3961]     0  3961    47276      332   0       0
>      0 namcd
> <6>[598175.285172] [ 4073]     0  4073    10998      104   0       0
>      0 master
> <6>[598175.285174] [ 4099]    51  4099    14190      229   0       0
>      0 qmgr
> <6>[598175.285176] [ 4135]     0  4135    33370       99   1       0
>      0 httpd2-prefork
> <6>[598175.285178] [ 4136]    30  4136    35518       85   1       0
>      0 httpd2-prefork
> <6>[598175.285180] [ 4137]    30  4137    35523      266   0       0
>      0 httpd2-prefork
> <6>[598175.285182] [ 4138]    30  4138    35523      111   0       0
>      0 httpd2-prefork
> <6>[598175.285184] [ 4139]    30  4139    35523      137   0       0
>      0 httpd2-prefork
> <6>[598175.285186] [ 4140]    30  4140    35523      299   0       0
>      0 httpd2-prefork
> <6>[598175.285188] [ 4168]     0  4168     5751       86   0       0
>      0 cron
> <6>[598175.285190] [ 4349]     0  4349    43028      120   0       0
>      0 ndpapp
> <6>[598175.285194] [ 4548]     0  4548    17722       33   0       0
>      0 adminusd
> <6>[598175.285196] [ 4577]     0  4577    17136       26   1       0
>      0 jstcpd
> <6>[598175.285198] [ 4580]     0  4580    12511        0   1       0
>      0 jstcpd
> <6>[598175.285200] [ 4601]     0  4601    10976       42   1       0
>      0 vlrpc
> <6>[598175.285202] [ 4621]     0  4621     4222       54   1     -17
>  -1000 udevd
> <6>[598175.285204] [ 4672]     0  4672    21525       70   0       0
>      0 volmnd
> <6>[598175.285206] [ 4693]     0  4693    48377      195   0       0
>      0 ncp2nss
> <6>[598175.285208] [ 4942]    81  4942    40049       32   0       0
>      0 novell-xregd
> <6>[598175.285210] [ 5195]     0  5195    90312      479   0       0
>      0 cifsd
> <6>[598175.285212] [ 5240]     0  5240     9586        9   1       0
>      0 smdrd
> <6>[598175.285214] [ 5279]     0  5279    55127      172   0       0
>      0 novfsd
> <6>[598175.285216] [ 5327]   104  5327     9431       72   0       0
>      0 nrpe
> <6>[598175.285218] [ 5337]     0  5337     3177       78   0       0
>      0 mingetty
> <6>[598175.285219] [ 5338]     0  5338     3177       78   1       0
>      0 mingetty
> <6>[598175.285221] [ 5339]     0  5339     3177       78   0       0
>      0 mingetty
> <6>[598175.285223] [ 5340]     0  5340     3177       78   1       0
>      0 mingetty
> <6>[598175.285225] [ 5341]     0  5341     3177       78   0       0
>      0 mingetty
> <6>[598175.285227] [ 5342]     0  5342     3177       78   1       0
>      0 mingetty
> <6>[598175.285229] [ 5520]     0  5520    67658       99   0       0
>      0 cifsd
> <6>[598175.285231] [25139]     0 25139    17698      836   0       0
>      0 snmpd
> <6>[598175.285233] [ 4842]    51  4842    14147      511   0       0
>      0 pickup
> <6>[598175.285235] [ 7917]     0  7917    21027     2460   1       0
>      0 savepnpc
> <3>[598175.285237] Out of memory: Kill process 3719 (ndsd) score 19 or
> sacrifice child
> <3>[598175.285239] Killed process 3719 (ndsd) total-vm:881244kB,
> anon-rss:0kB, file-rss:8020kB
>
>
>
> Thanks
>
> Marki
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]