Re: `page allocation failure: order:0` with ixgbe under high load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Fri, Aug 11, 2017 at 8:36 AM, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
> Dear Linux folks,
>
>
> Stress-testing a Dell PowerEdge T630 (12x E5-2603 v4 @ 1.70GHz, 96.3 GB RAM)
> with Linux 4.9.41 by writing 40 100 GB files in parallel from different
> systems into an NFS exported directory, after some time Linux writes the
> messages below.

We saw similar OOM for atomic allocations in RX.


>
>> $ dmesg -T
>> […]
>>
>> [Fri Aug 11 16:51:47 2017] swapper/0: page allocation failure: order:0,
>> mode:0x2080020(GFP_ATOMIC)
>> [Fri Aug 11 16:51:47 2017] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.9.41.mx64.169 #1
>> [Fri Aug 11 16:51:47 2017] Hardware name: Dell Inc. PowerEdge T630/0NT78X,
>> BIOS 2.4.2 01/09/2017
>> [Fri Aug 11 16:51:47 2017]  ffff880c4fc03bd0 ffffffff813e31a6
>> ffffffff81f341e8 0000000000000000
>> [Fri Aug 11 16:51:47 2017]  ffff880c4fc03c50 ffffffff8113fedc
>> 0208002000000011 ffffffff81f341e8
>> [Fri Aug 11 16:51:47 2017]  ffff880c4fc03bf8 ffff880c00000010
>> ffff880c4fc03c60 ffff880c4fc03c10
>> [Fri Aug 11 16:51:47 2017] Call Trace:
>> [Fri Aug 11 16:51:47 2017]  <IRQ>
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff813e31a6>] dump_stack+0x4d/0x67
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff8113fedc>] warn_alloc+0x11c/0x140
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff811403db>]
>> __alloc_pages_nodemask+0x45b/0xe70
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81140f6b>]
>> __alloc_page_frag+0x17b/0x1a0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff819dbebc>]
>> __napi_alloc_skb+0xac/0xf0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffffa01f7263>]
>> ixgbe_clean_rx_irq+0xf3/0x950 [ixgbe]
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff819d4df8>] ?
>> skb_release_data+0xa8/0xd0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffffa01f886f>] ixgbe_poll+0x4df/0x7a0
>> [ixgbe]
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff819ebfe9>] net_rx_action+0x1f9/0x340
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a9d2e8>] __do_softirq+0x88/0x29b
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81064236>] irq_exit+0x76/0x80
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a9d093>] do_IRQ+0x63/0xf0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a9b57f>]
>> common_interrupt+0x7f/0x7f
>> [Fri Aug 11 16:51:47 2017]  <EOI>
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a9a41d>] ? mwait_idle+0x6d/0x170
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff810283ef>] arch_cpu_idle+0xf/0x20
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a9a713>]
>> default_idle_call+0x23/0x30
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff8109e435>]
>> cpu_startup_entry+0x175/0x1e0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff81a94fa7>] rest_init+0x77/0x80
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff8235fec3>] start_kernel+0x3b3/0x3c0
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff8235f28f>]
>> x86_64_start_reservations+0x2a/0x2c
>> [Fri Aug 11 16:51:47 2017]  [<ffffffff8235f3f9>]
>> x86_64_start_kernel+0x168/0x176
>> [Fri Aug 11 16:51:47 2017] Mem-Info:
>> [Fri Aug 11 16:51:47 2017] active_anon:29369 inactive_anon:351
>> isolated_anon:0
>> [Fri Aug 11 16:51:47 2017]  active_file:22668707 inactive_file:1211442
>> isolated_file:64
>> [Fri Aug 11 16:51:47 2017]  unevictable:0 dirty:871455 writeback:10178
>> unstable:0
>> [Fri Aug 11 16:51:47 2017]  slab_reclaimable:527010
>> slab_unreclaimable:35231
>> [Fri Aug 11 16:51:47 2017]  mapped:3998 shmem:360 pagetables:1931 bounce:0
>> [Fri Aug 11 16:51:47 2017]  free:53778 free_pcp:4689 free_cma:0
>> [Fri Aug 11 16:51:47 2017] Node 0 active_anon:68492kB inactive_anon:1296kB
>> active_file:44916512kB inactive_file:2415652kB unevictable:0kB
>> isolated(anon):0kB isolated(file):128kB mapped:5448kB dirty:1750600kB
>> writeback:24708kB shmem:1304kB writeback_tmp:0kB unstable:0kB
>> pages_scanned:0 all_unreclaimable? no
>> [Fri Aug 11 16:51:47 2017] Node 1 active_anon:48984kB inactive_anon:108kB
>> active_file:45758316kB inactive_file:2430116kB unevictable:0kB
>> isolated(anon):0kB isolated(file):128kB mapped:10544kB dirty:1735220kB
>> writeback:16004kB shmem:136kB writeback_tmp:0kB unstable:0kB pages_scanned:0
>> all_unreclaimable? no
>> [Fri Aug 11 16:51:47 2017] Node 0 DMA free:15896kB min:4kB low:16kB
>> high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB
>> inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB
>> managed:15896kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
>> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> free_cma:0kB
>> [Fri Aug 11 16:51:47 2017] lowmem_reserve[]: 0 1592 47927 47927
>> [Fri Aug 11 16:51:47 2017] Node 0 DMA32 free:185436kB min:656kB low:2284kB
>> high:3912kB active_anon:656kB inactive_anon:0kB active_file:1306260kB
>> inactive_file:85160kB unevictable:0kB writepending:63324kB present:1985356kB
>> managed:1640236kB mlocked:0kB slab_reclaimable:35744kB
>> slab_unreclaimable:1268kB kernel_stack:252kB pagetables:28kB bounce:0kB
>> free_pcp:6060kB local_pcp:544kB free_cma:0kB
>> [Fri Aug 11 16:51:47 2017] lowmem_reserve[]: 0 0 46335 46335
>> [Fri Aug 11 16:51:47 2017] Node 0 Normal free:6672kB min:19108kB
>> low:66552kB high:113996kB active_anon:67836kB inactive_anon:1296kB
>> active_file:43610252kB inactive_file:2330492kB unevictable:0kB
>> writepending:1712224kB present:48234496kB managed:47447216kB mlocked:0kB
>> slab_reclaimable:1085248kB slab_unreclaimable:75896kB kernel_stack:26424kB
>> pagetables:2636kB bounce:0kB free_pcp:6652kB local_pcp:728kB free_cma:0kB
>> [Fri Aug 11 16:51:47 2017] lowmem_reserve[]: 0 0 0 0
>> [Fri Aug 11 16:51:47 2017] Node 1 Normal free:7108kB min:19952kB
>> low:69492kB high:119032kB active_anon:48984kB inactive_anon:108kB
>> active_file:45758316kB inactive_file:2430116kB unevictable:0kB
>> writepending:1751708kB present:50331648kB managed:49543960kB mlocked:0kB
>> slab_reclaimable:987048kB slab_unreclaimable:63760kB kernel_stack:28988kB
>> pagetables:5060kB bounce:0kB free_pcp:6044kB local_pcp:560kB free_cma:0kB
>> [Fri Aug 11 16:51:47 2017] lowmem_reserve[]: 0 0 0 0
>> [Fri Aug 11 16:51:47 2017] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB
>> 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M)
>> 3*4096kB (M) = 15896kB
>> [Fri Aug 11 16:51:47 2017] Node 0 DMA32: 831*4kB (UME) 895*8kB (UME)
>> 275*16kB (UME) 33*32kB (UME) 70*64kB (UME) 216*128kB (UME) 29*256kB (UM)
>> 19*512kB (UM) 3*1024kB (UM) 5*2048kB (UE) 26*4096kB (UM) = 185028kB
>> [Fri Aug 11 16:51:47 2017] Node 0 Normal: 1558*4kB (UM) 0*8kB 0*16kB
>> 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6232kB
>> [Fri Aug 11 16:51:47 2017] Node 1 Normal: 121*4kB (ME) 768*8kB (M) 0*16kB
>> 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6628kB
>> [Fri Aug 11 16:51:47 2017] Node 0 hugepages_total=0 hugepages_free=0
>> hugepages_surp=0 hugepages_size=2048kB
>> [Fri Aug 11 16:51:47 2017] Node 1 hugepages_total=0 hugepages_free=0
>> hugepages_surp=0 hugepages_size=2048kB
>> [Fri Aug 11 16:51:47 2017] 23880608 total pagecache pages
>> [Fri Aug 11 16:51:47 2017] 23880608 total pagecache pages
>> [Fri Aug 11 16:51:47 2017] 0 pages in swap cache
>> [Fri Aug 11 16:51:47 2017] Swap cache stats: add 0, delete 0, find 0/0
>> [Fri Aug 11 16:51:47 2017] Free swap  = 0kB
>> [Fri Aug 11 16:51:47 2017] Total swap = 0kB
>> [Fri Aug 11 16:51:47 2017] 25141870 pages RAM
>> [Fri Aug 11 16:51:47 2017] 0 pages HighMem/MovableOnly
>> [Fri Aug 11 16:51:47 2017] 480043 pages reserved
>
>> […]
>
> Is that a problem with the network device module ixgbe?
>
> ```
> 04:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
> (rev 01)
> 04:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
> (rev 01)
> ```
>
> Or should some parameters be tuned?
>
> ```
> $ more /proc/sys/vm/min*
> ::::::::::::::
> /proc/sys/vm/min_free_kbytes
> ::::::::::::::
> 39726


Can you try to increase this? Although it depends on your workload,
38M seems too small for a host with 96+G memory.



> ::::::::::::::
> /proc/sys/vm/min_slab_ratio
> ::::::::::::::
> 5
> ::::::::::::::
> /proc/sys/vm/min_unmapped_ratio
> ::::::::::::::
> 1
> ```
>
> There is quite some information about this on the WWW [1], but some suggest
> that with recent Linux kernels, this shouldn’t happen, as memory get
> defragmented.


On the other hand, the allocation order is 0 anyway. ;)


Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux