Re: [PATCH] mm: fix panic in __alloc_pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02.11.21 12:44, Michal Hocko wrote:
> On Tue 02-11-21 12:00:57, David Hildenbrand wrote:
>> On 02.11.21 11:34, Alexey Makhalov wrote:
> [...]
>>>> The node onlining logic when onlining a CPU sounds bogus as well: Let's
>>>> take a look at try_offline_node(). It checks that:
>>>> 1) That no memory is *present*
>>>> 2) That no CPU is *present*
>>>>
>>>> We should online the node when adding the CPU ("present"), not when
>>>> onlining the CPU.
>>>
>>> Possible.
>>> Assuming try_online_node was moved under add_cpu(), let’s
>>> take look on this call stack:
>>> add_cpu()
>>>   try_online_node()
>>>     __try_online_node()
>>>       hotadd_new_pgdat()
>>> At line 1190 we'll have a problem:
>>> 1183         pgdat = NODE_DATA(nid);
>>> 1184         if (!pgdat) {
>>> 1185                 pgdat = arch_alloc_nodedata(nid);
>>> 1186                 if (!pgdat)
>>> 1187                         return NULL;
>>> 1188
>>> 1189                 pgdat->per_cpu_nodestats =
>>> 1190                         alloc_percpu(struct per_cpu_nodestat);
>>> 1191                 arch_refresh_nodedata(nid, pgdat);
>>>
>>> alloc_percpu() will go for all possible CPUs and will eventually end up
>>> calling alloc_pages_node() trying to use subject nid for corresponding CPU
>>> hitting the same state #2 problem as NODE_DATA(nid) is still NULL and nid
>>> is not yet online.
>>
>> Right, we will end up calling pcpu_alloc_pages()->alloc_pages_node() for
>> each possible CPU. We use cpu_to_node() to come up with the NID.
> 
> Shouldn't this be numa_mem_id instead? Memory less nodes are odd little

Hm, good question. Most probably yes for offline nodes.

diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 2054c9213c43..c21ff5bb91dc 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -84,15 +84,19 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
                            gfp_t gfp)
 {
        unsigned int cpu, tcpu;
-       int i;
+       int i, nid;
 
        gfp |= __GFP_HIGHMEM;
 
        for_each_possible_cpu(cpu) {
+               nid = cpu_to_node(cpu);
+
+               if (nid == NUMA_NO_NODE || !node_online(nid))
+                       nid = numa_mem_id();
                for (i = page_start; i < page_end; i++) {
                        struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
 
-                       *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+                       *pagep = alloc_pages_node(nid, gfp, 0);
                        if (!*pagep)
                                goto err;
                }


> critters crafted into the MM code without wider considerations. From
> time to time we are struggling with some fallouts but the primary thing
> is that zonelists should be valid for all memory less nodes.

Yes, but a zonelist cannot be correct for an offline node, where we might
not even have an allocated pgdat yet. No pgdat, no zonelist. So as soon as
we allocate the pgdat and set the node online (->hotadd_new_pgdat()), the zone lists have to be correct. And I can spot an build_all_zonelists() in hotadd_new_pgdat().

I agree that someone passing an offline NID into an allocator function
should be fixed.

Maybe __alloc_pages_bulk() and alloc_pages_node() should bail out directly
(VM_BUG()) in case we're providing an offline node with eventually no/stale pgdat as
preferred nid.

-- 
Thanks,

David / dhildenb




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux