Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrew.

On Sat, May 21, 2011 at 10:34 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Sat, May 21, 2011 at 8:04 AM, KOSAKI Motohiro
> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 3f44b81..d1dabc9 100644
>>> @@ -1426,8 +1437,13 @@ shrink_inactive_list(unsigned long nr_to_scan,
>>> struct zone *zone,
>>>
>>> Â Â Â Â/* Check if we should syncronously wait for writeback */
>>> Â Â Â Âif (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
>>> + Â Â Â Â Â Â Â unsigned long nr_active, old_nr_scanned;
>>> Â Â Â Â Â Â Â Âset_reclaim_mode(priority, sc, true);
>>> + Â Â Â Â Â Â Â nr_active = clear_active_flags(&page_list, NULL);
>>> + Â Â Â Â Â Â Â count_vm_events(PGDEACTIVATE, nr_active);
>>> + Â Â Â Â Â Â Â old_nr_scanned = sc->nr_scanned;
>>> Â Â Â Â Â Â Â Ânr_reclaimed += shrink_page_list(&page_list, zone, sc);
>>> + Â Â Â Â Â Â Â sc->nr_scanned = old_nr_scanned;
>>> Â Â Â Â}
>>>
>>> Â Â Â Âlocal_irq_disable();
>>>
>>> I just tested 2.6.38.6 with the attached patch. ÂIt survived dirty_ram
>>> and test_mempressure without any problems other than slowness, but
>>> when I hit ctrl-c to stop test_mempressure, I got the attached oom.
>>
>> Minchan,
>>
>> I'm confused now.
>> If pages got SetPageActive(), should_reclaim_stall() should never return true.
>> Can you please explain which bad scenario was happen?
>>
>> -----------------------------------------------------------------------------------------------------
>> static void reset_reclaim_mode(struct scan_control *sc)
>> {
>> Â Â Â Âsc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
>> }
>>
>> shrink_page_list()
>> {
>> Â(snip)
>> Âactivate_locked:
>> Â Â Â Â Â Â Â ÂSetPageActive(page);
>> Â Â Â Â Â Â Â Âpgactivate++;
>> Â Â Â Â Â Â Â Âunlock_page(page);
>> Â Â Â Â Â Â Â Âreset_reclaim_mode(sc); Â Â Â Â Â Â Â Â Â/// here
>> Â Â Â Â Â Â Â Âlist_add(&page->lru, &ret_pages);
>> Â Â Â Â}
>> -----------------------------------------------------------------------------------------------------
>>
>>
>> -----------------------------------------------------------------------------------------------------
>> bool should_reclaim_stall()
>> {
>> Â(snip)
>>
>> Â Â Â Â/* Only stall on lumpy reclaim */
>> Â Â Â Âif (sc->reclaim_mode & RECLAIM_MODE_SINGLE) Â /// and here
>> Â Â Â Â Â Â Â Âreturn false;
>> -----------------------------------------------------------------------------------------------------
>>
>
> I did some tracing and the oops happens from the second call to
> shrink_page_list after should_reclaim_stall returns true and it hits
> the same pages in the same order that the earlier call just finished
> calling SetPageActive on. ÂI have *not* confirmed that the two calls
> happened from the same call to shrink_inactive_list, but something's
> certainly wrong in there.
>
> This is very easy to reproduce on my laptop.

I would like to confirm this problem.
Could you show the diff of 2.6.38.6 with current your 2.6.38.6 + alpha?
(ie, I would like to know that what patches you add up on vanilla
2.6.38.6 to reproduce this problem)
I believe you added my crap below patch. Right?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 292582c..69d317e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -311,7 +311,8 @@ static void set_reclaim_mode(int priority, struct
scan_control *sc,
        */
       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
               sc->reclaim_mode |= syncmode;
-       else if (sc->order && priority < DEF_PRIORITY - 2)
+       else if ((sc->order && priority < DEF_PRIORITY - 2) ||
+                               prioiry <= DEF_PRIORITY / 3)
               sc->reclaim_mode |= syncmode;
       else
               sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
@@ -1349,10 +1350,6 @@ static inline bool
should_reclaim_stall(unsigned long nr_taken,
       if (current_is_kswapd())
               return false;

-       /* Only stall on lumpy reclaim */
-       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
-               return false;
-
       /* If we have relaimed everything on the isolated list, no stall */
       if (nr_freed == nr_taken)
               return false;


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]