AFAIK, CONFIG_DEFERRED_STRUCT_PAGE_INIT will just initialize a small amount of page structs, then defer the remaining page structs initialization to kernel threads (one thread per node, called pgdatinit0/1/2/3). So, if your trace buffer allocation is *before* the kernel threads finishing the page struct initialization, you may run into this case.
Yang
On Mon, Nov 13, 2017 at 9:48 AM, YASUAKI ISHIMATSU <yasu.isimatu@xxxxxxxxx> wrote:
When using trace_buf_size= boot option, memory allocation of ring buffer
for trace fails as follows:
[ ] x86: Booting SMP configuration:
[ ] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
[ ] .... node #1, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[ ] .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71
[ ] .... node #3, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95
[ ] .... node #4, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119
[ ] .... node #5, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143
[ ] .... node #6, CPUs: #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154
[ ] swapper/0: page allocation failure: order:0, mode:0x16004c0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_ NOTRACK), nodemask=(null)
[ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc8+ #13
[ ] Hardware name: ...
[ ] Call Trace:
[ ] dump_stack+0x63/0x89
[ ] warn_alloc+0x114/0x1c0
[ ] ? _find_next_bit+0x60/0x60
[ ] __alloc_pages_slowpath+0x9a6/0xba7
[ ] __alloc_pages_nodemask+0x26a/0x290
[ ] new_slab+0x297/0x500
[ ] ___slab_alloc+0x335/0x4a0
[ ] ? __rb_allocate_pages+0xae/0x180
[ ] ? __rb_allocate_pages+0xae/0x180
[ ] __slab_alloc+0x40/0x66
[ ] __kmalloc_node+0xbd/0x270
[ ] __rb_allocate_pages+0xae/0x180
[ ] rb_allocate_cpu_buffer+0x204/0x2f0
[ ] trace_rb_cpu_prepare+0x7e/0xc5
[ ] cpuhp_invoke_callback+0x3ea/0x5c0
[ ] ? init_idle+0x1a7/0x1c0
[ ] ? ring_buffer_record_is_on+0x20/0x20
[ ] _cpu_up+0xbc/0x190
[ ] do_cpu_up+0x87/0xb0
[ ] cpu_up+0x13/0x20
[ ] smp_init+0x69/0xca
[ ] kernel_init_freeable+0x115/0x244
[ ] ? rest_init+0xb0/0xb0
[ ] kernel_init+0xe/0x109
[ ] ret_from_fork+0x25/0x30
[ ] Mem-Info:
[ ] active_anon:0 inactive_anon:0 isolated_anon:0
[ ] active_file:0 inactive_file:0 isolated_file:0
[ ] unevictable:0 dirty:0 writeback:0 unstable:0
[ ] slab_reclaimable:1260 slab_unreclaimable:489185
[ ] mapped:0 shmem:0 pagetables:0 bounce:0
[ ] free:46 free_pcp:1421 free_cma:0
.
[ ] failed to allocate ring buffer on CPU 155
In my server, there are 384 CPUs, 512 GB memory and 8 nodes. And
"trace_buf_size=100M" is set.
When using trace_buf_size=100M, kernel allocates 100 MB memory
per CPU before calling free_are_init_core(). Kernel tries to
allocates 38.4GB (100 MB * 384 CPU) memory. But available memory
at this time is about 16GB (2 GB * 8 nodes) due to the following commit:
3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages
if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
So allocation failure occurs.
Thanks,
Yasuaki Ishimatsu