Re: Possible deadloop in direct reclaim?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 23, 2013 at 12:58 PM, Lisa Du <cldu@xxxxxxxxxxx> wrote:
> Dear Sir:
>
> Currently I met a possible deadloop in direct reclaim. After run plenty of
> the application, system run into a status that system memory is very
> fragmentized. Like only order-0 and order-1 memory left.
>
> Then one process required a order-2 buffer but it enter an endless direct
> reclaim. From my trace log, I can see this loop already over 200,000 times.
> Kswapd was first wake up and then go back to sleep as it cannot rebalance
> this order’s memory. But zone->all_unreclaimable remains 1.
>
> Though direct_reclaim every time returns no pages, but as
> zone->all_unreclaimable = 1, so it loop again and again. Even when
> zone->pages_scanned also becomes very large. It will block the process for
> long time, until some watchdog thread detect this and kill this process.
> Though it’s in __alloc_pages_slowpath, but it’s too slow right? Maybe cost
> over 50 seconds or even more.

You must be mean zone->all_unreclaimable = 0?

>
> I think it’s not as expected right?  Can we also add below check in the
> function all_unreclaimable() to terminate this loop?
>
>
>
> @@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist
> *zonelist,
>
>                         continue;
>
>                 if (!zone->all_unreclaimable)
>
>                         return false;
>
> +               if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone))
>
> +                       return true;
>

How about replace the checking in kswapd_shrink_zone()?

@@ -2824,7 +2824,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
        /* Account for the number of pages attempted to reclaim */
        *nr_attempted += sc->nr_to_reclaim;

-       if (nr_slab == 0 && !zone_reclaimable(zone))
+       if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone))
                zone->all_unreclaimable = 1;

        zone_clear_flag(zone, ZONE_WRITEBACK);


I think the current check is wrong, reclaimed a slab doesn't mean
reclaimed a page.

-- 
Regards,
--Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]