Re: Still OOM problems with 4.9er kernels

Gerhard Wiesinger <lists@xxxxxxxxxxxxx> · Sat, 10 Dec 2016 14:50:34 +0100

On 09.12.2016 22:42, Vlastimil Babka wrote:
On 12/09/2016 07:01 PM, Gerhard Wiesinger wrote:
On 09.12.2016 18:30, Michal Hocko wrote:
On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:
On 09.12.2016 17:09, Michal Hocko wrote:
[...]
[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
                   active_file:3902 inactive_file:3639 isolated_file:0
                   unevictable:0 dirty:205 writeback:0 unstable:0
                   slab_reclaimable:9856 slab_unreclaimable:9682
                   mapped:3722 shmem:59 pagetables:2080 bounce:0
                   free:748 free_pcp:15 free_cma:0
there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large
Yes, it might be large on the update situation, but that should be handled
by a virtual memory system by the kernel, right?
Well this is what we try and call it memory reclaim. But if we are not
able to reclaim anything then we eventually have to give up and trigger
the OOM killer.
I'm not familiar with the Linux implementation of the VM system in
detail. But can't you reserve as much memory for the kernel (non
pageable) at least that you can swap everything out (even without
killing a process at least as long there is enough swap available, which
should be in all of my cases)?
We don't have such bulletproof reserves. In this case the amount of
anonymous memory that can be swapped out is relatively low, and either
something is pinning it in memory, or it's being swapped back in quickly.

   Now the information that 4.4 made a difference is
interesting. I do not really see any major differences in the reclaim
between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
E.g. some of the subsystem consumes much more memory than before.

Just curious, what kind of filesystem are you using?
I'm using ext4 only with virt-* drivers (storage, network). But it is
definitly a virtual memory allocation/swap usage issue.

   Could you try some
additional debugging. Enabling reclaim related tracepoints might tell us
more. The following should tell us more
mount -t tracefs none /trace
echo 1 > /trace/events/vmscan/enable
echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
cat /trace/trace_pipe > trace.log

Collecting /proc/vmstat over time might be helpful as well
mkdir logs
while true
do
	cp /proc/vmstat vmstat.$(date +%s)
	sleep 1s
done
Activated it. But I think it should be very easy to trigger also on your
side. A very small configured VM with a program running RAM
allocations/writes (I guess you have some testing programs already)
should be sufficient to trigger it. You can also use the attached
program which I used to trigger such situations some years ago. If it
doesn't help try to reduce the available CPU for the VM and also I/O
(e.g. use all CPU/IO on the host or other VMs).
Well it's not really a surprise that if the VM is small enough and
workload large enough, OOM killer will kick in. The exact threshold
might have changed between kernel versions for a number of possible reasons.

IMHO: The OOM killer should NOT kick in even on the highest workloads if 
there is swap available.

https://www.spinics.net/lists/linux-mm/msg113665.html

Yeah, but I do think that "oom when you have 156MB free and 7GB
reclaimable, and haven't even tried swapping" counts as obviously
wrong.

So Linus also thinks that trying swapping is a must have. And there always was enough swap available in my cases. Then it should swap out/swapin all the time (which worked well in kernel 2.4/2.6 times).

Another topic: Why does the kernel prefer to swap in/swap out instead of 
use cache pages/buffers (see vmstat 1 output below)?

BTW: Don't know if you have seen also my original message on the kernel
mailinglist only:

Linus had also OOM problems with 1kB RAM requests and a lot of free RAM
(use a translation service for the german page):
https://lkml.org/lkml/2016/11/30/64
https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html
Yeah we were involved in the last one. The regressions were about
high-order allocations
though (the 1kB premise turned out to be misinterpretation) and there
were regressions
for those in 4.7/4.8. But yours are order-0.

With kernel 4.7./4.8 it was really reaproduceable at every dnf update. 
With 4.9rc8 it has been much much better. So something must have 
changed, too.

As far as I understood it the order is 2^order kB pagesize. I don't 
think it makes a difference when swap is not used which order the memory 
allocation request is.

BTW: What were the commit that introduced the regression anf fixed it in 
4.9?

Thnx.

Ciao,

Gerhard

procs -----------memory---------- ---swap-- -----io---- -system-- 
------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy 
id wa st
 3  0  45232   9252   1956 109644  428  232  3536   416 4310 4228 38 36 
14  7  6
 2  0  45124  10524   1960 110192  124    0   528    96 2478 2243 45 29 
20  5  1
 4  1  45136   3896   1968 114388   84   64  4824   260 2689 2655 38 31 
15 12  4
 1  1  45484  10648    288 114032   88  356 20028  1132 5078 5122 24 
45  4 21  5
 2  0  44700   8092   1240 115204  728    0  2624   536 4204 4413 38 38 
18  3  4
 2  0  44852  10272   1240 111324   52  212  2736  1548 3311 2970 41 36 
12  9  2
 4  0  44844  10716   1240 111216    8    0     8    72 3067 3287 42 30 
18  7  3
 3  0  44828  10268   1248 111280   16    0    16    60 2139 1610 43 29 
11  1 17
 1  0  44828  11644   1248 111192    0    0     0     0 2367 1911 50 32 
14  0  3
 4  0  44820   9004   1248 111284    8    0     8     0 2207 1867 55 31 
14  0  1
 7  0  45664   6360   1816 109264   20  868  3076   968 4122 3783 43 37 
17  0  3
 4  4  46880   6732   1092 101960  244 1332  7968  3352 5836 6431 17 
51  1 27  4
10  2  47064   6940   1364  96340   20  196 25708  1720 7346 6447 13 70  
0 18  1
15  3  47572   3672   2156  92604   68  580 29244  1692 5640 5102  5 57  
0 37  2
12  4  48300   6740    352  87924   80  948 36208  2948 7287 7955  7 73  
0 18  2
12  9  50796   4832    584  88372    0 2496 16064  3312 3425 4185  2 30  
0 66  1
10  9  52636   3608   2068  90132   56 1840 24552  2836 4123 4099  3 43  
0 52  1
 7 11  56740  10376    424  86204  184 4152 33116  5628 7949 7952  4 
67  0 23  6
10  4  61384   8000    776  86956  644 4784 28380  5484 7965 9935  7 64  
0 26  2
11  4  68052   5260   1028  87268 1244 7164 23380  8684 10715 10863  8 
71  0 20  1
11  2  72244   3924   1052  85160  980 4264 23756  4940 7231 7930  8 62  
0 29  1
 6  1  76388   5352   4948  86204 1292 4640 27380  5244 7816 8714 10 
63  0 22  5
 8  5  77376   4168   1944  86528 3064 3684 19876  4104 9325 9076  9 
64  1 22  4
 5  4  75464   7272   1240  81684 3912 3188 25656  4100 9973 10515 11 
65  0 20  4
 5  2  77364   4440   1852  84744  528 2304 28588  3304 6605 6311  7 
61  8 18  4
 9  2  81648   3760   3188  86012  440 4588 17928  5368 6377 6320  8 
48  2 40  4
 6  2  82404   6608    668  86092 2016 2084 24396  3564 7440 7510  8 
66  1 20  4
 4  4  81728   3796   2260  87764 1392  984 18512  1684 5196 4652  6 
48  0 42  4
 8  4  84700   6436   1428  85744 1188 3708 20256  4364 6405 5998  9 
63  0 24  4
 3  1  86360   4836    924  87700 1388 2692 19460  3504 5498 6117  8 
48  0 34  9
 4  4  87916   3768    176  86592 2788 3220 19664  4032 7285 8342 19 
63  0 10  9
 4  4  89612   4952    180  88076 1516 2988 17560  3936 5737 5794  7 
46  0 37 10
 7  5  87768  12244    196  87856 3344 2544 22248  3348 6934 7497  8 
59  0 22 10
10  1  83436   4768    840  96452 4096  836 20100  1160 6191 6614 21 52  
0 13 14
 0  6  82868   6972    348  91020 1108  520  4896   568 3274 4214 11 26 
29 30  4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>