Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 27 Dec 2015, Figo Zhang wrote:

> Android System UI hang when run heavy monkey stress test.
> 
> V2: add more detail about how to re-produce this issue, the
> important is install more than 100 apps/games.
> 
> Re-produce step:
> Run this monkey stress test script with more than 100
> apps/games installed:
> 
> adb shell "monkey --ignore-crashes --ignore-timeouts
> --kill-process-after-error --ignore-security-exceptions
> --throttle 200 -v 20000000"
> 
> kernel log:
> [ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
> [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
> [ 1526.272299] lowmem_d_timeout=4296194081
> [ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
> [ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
> [ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
> [ 1526.272303]    Free memory is -39396kB above reserved
> [ 1526.272304] lowmem_scan end: 128, 213da, return 10630
> [ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
> [ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
> [ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
> [ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
> [ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
> [ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
> [ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
> [ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
> [ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
> [ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
> [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
> [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
> [ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
> [ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
> [ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
> [ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109
> 
> when this happened, the android system UI will hang, no process can be
> select to kill.
> 
> i found the the value of "lowmem_deathpending_timeout" will be modified
> strangely, like in last killing, the value is 4296194081, but why not it
> had changed to 4296195109? so it will cause the deadloop in low memory
> state which will cause the android system UI hang, because no process will
>  be kill.
> 

I'm assuming that you are loading the lowmem killer as a module since 
that's how you would modify lowmem_debug_level.  It appears that 
lowmem_debug_level is 2 from your kernel log, otherwise part of the log is 
missing.

I can tell this since you have a

	[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill

line but not a line matching "send sigkill to %d (%s), adj %hd, size %d\n" 
with loglevel 1.

I think changing lowmem_debug_level to 1 would help to understand this 
issue better.

I think lowmem_deathpending_timeout is getting changed to 4296195109 at

	[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
	> HERE <
	[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529

However, it appears that the same process, dTi-lm, is still chosen for oom 
kill because lowmem_deathpending_timeout has expired.

So this looks like a problem if the constantly chosen process cannot exit.  
It would have been helpful to have the stack of pid 27289 in the log to 
see where it was stuck.  But I think it may be unrelated to 
lowmem_deathpending_timeout itself.  We'd be better off selecting a 
different process to kill with something like this:

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -128,11 +128,15 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
-			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			if (time_before_eq(jiffies,
+					   lowmem_deathpending_timeout)) {
+				task_unlock(p);
+				rcu_read_unlock();
+				return 0;
+			}
+			/* Need to select a different process to kill */
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {

But we need more information.  Please make sure that lowmem_debug_level is 
1, try to get a complete kernel log, and if possible please try to capture 
the stack of the process that can't exit (use /proc/<pid>/stack) before 
trying the above patch.
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel



[Index of Archives]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux