Re: OOM killer changes

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 16 Aug 2016 09:44:34 +0200

On 08/16/2016 05:12 AM, Joonsoo Kim wrote:
> On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote:
>> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
>>>>
>>> Took me a little longer than expected due to work. The failure wouldn't 
>>> happen for a while and so I started a couple of scripts and let them 
>>> run. When I checked today the server didn't respond on the network and 
>>> sure enough it had killed everything. This is with 4.7.0 with the config 
>>> based on Debian 4.7-rc7.
>>>
>>> trace_pipe got a little big (5GB) so I uploaded the logs to 
>>> https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
>>> before the btrfs filesystems were mounted.
>>> I did run a btrfs balance because it creates IO load and I needed to 
>>> balance anyway. Maybe that's what caused it?
>>
>> pgmigrate_success        46738962
>> pgmigrate_fail          135649772
>> compact_migrate_scanned 309726659
>> compact_free_scanned   9715615169
>> compact_isolated        229689596
>> compact_stall 4777
>> compact_fail 3068
>> compact_success 1709
>> compact_daemon_wake 207834
>>
>> The migration failures are quite enormous. Very quick analysis of the
>> trace seems to confirm that these are mostly "real", as opposed to result
>> of failure to isolate free pages for migration targets, although the free
>> scanner spent a lot of time:
> 
> I don't think that main reason of OOM is 'real' migration failure.
> If it is the case, compaction would find next migratable pages and
> eventually some of pages would be migrated successfully.
> 
> pagetypeinfo shows that there are too many unmovable pageblock.

Hmm, well spotted. And also somewhat suspicious, I would expect
filesystem activity to result in reclaimable allocations, not unmovable
(not that it makes any difference for compaction).

Checking nr_slab_* in zoneinfo shows that it really should be mostly
reclaimable:

nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_slab_reclaimable 32709
nr_slab_unreclaimable 2764
nr_slab_reclaimable 101525
nr_slab_unreclaimable 10852

Compared with:

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate 
Node 0, zone      DMA            1            7            0            0            0 
Node 0, zone    DMA32          893           72           51            0            0 
Node 0, zone   Normal         2780          155          137            0            0 

We have 188 reclaimable blocks, that's 96256 pages. sum of nr_slab_reclaimable
is 134234, which suggests some fallbacks into unmovable blocks. But the rest
of all of those unmovable pageblocks must be filled by something else... some
btrfs buffers maybe?

> Freepage scanner don't scan those pageblocks so there is a large
> possibility that it cannot find freepages even if the system has many
> freepages. I think that this is the root cause of the problem.
> 
> It's better to check that following work-around help the problem.

Yes this might be good idea, minimally for higher compaction priorities.

Thanks.

> Thanks.
> 
> ------------>8-----------
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9affb29..965eddd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
>                 if (!page)
>                         continue;
>  
> -               /* Check the block is suitable for migration */
> -               if (!suitable_migration_target(page))
> -                       continue;
> -
>                 /* If isolation recently failed, do not retry */
>                 if (!isolation_suitable(cc, page))
>                         continue;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>