Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due to memory hot-add

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12.02.20 08:31, Baoquan He wrote:
> On 02/11/20 at 04:41pm, Andrew Morton wrote:
>> On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> wrote:
>>
>>> On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote:
>>>> On 02/10/20 at 02:09pm, Baoquan He wrote:
>>>>> On 02/09/20 at 09:56pm, Andrew Morton wrote:
>>>>>> On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He <bhe@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi Andrew,
>>>>>>>
>>>>>>> On 02/09/20 at 09:32pm, Andrew Morton wrote:
>>>>>>>> On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
>>>>>>>>
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=206401
>>>>>>>>>
>>>>>>>>
>>>>>>>> An oops during mem hotadd.  Could someone please take a look when
>>>>>>>> convenient?
>>>>>>>
>>>>>>> This has been addressed by Wei Yang's patch, please check it here:
>>>>>>>
>>>>>>> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
>>>>>>>
>>>>>>
>>>>>> hm, OK, thanks.  It's unfortunate that a 5.5 fix is buried in a
>>>>>> six-patch series which is still in progress!  Can we please merge that
>>>>>> as a standalone fix with a cc:stable, Fixes:, etc?
>>>>
>>>> Maybe can add Fixes tag as follow when merge:
>>>>
>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>>
>>
>> The reporter (cc'ed here) is still seeing issues:
>> https://bugzilla.kernel.org/show_bug.cgi?id=206401
>>
>> Could we please continue this investigation via emailed reply-to-all,
>> rather than via the bugzilla interface?
> 
> Yes, people prefer mailing list to discuss issues.
> 
> Hi T.Kabe, 
> 
> Could you provide the call trace again after below patch is applied?
> The comment #9 in bugzilla is not very clear to me.
> 
> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> 
> And, as you said, applying above patch, and do not call
> __free_pages_core() in generic_online_page() will work. I doubt it,
> because without __free_pages_core(), your added pages are not added
> into buddy for managing. 

Removing __free_pages_core() from generic_online_page() is just
plain wrong and would break memory hotplug in general. So that is
certainly not the right fix.

HV supports memory sections that are fully added, but only parts of
it are actually backed in the hypervisor, "online" and exposed to the buddy.

When onlining memory, it will online the backed parts via
hv_online_page()->generic_online_page(). When requested to hot add
more memory, the guest will online remaining parts that are now
backed handle_pg_range()->hv_bring_pgs_online().

So if generic_online_page() fails it's either because

1. HV guest driver has a bug and tries to online something it shouldn't
2. HV hypervisor has a bug and does not back memory properly before hot/adding
3. Memory hotplug code has a bug and does not properly add the memory block/sections


Please note that to using generic_online_page() in 

commit 30a9c246b9f6fe0591e8afb05758a3e3b096fabe
Author: David Hildenbrand <david@xxxxxxxxxx>
Date:   Sat Nov 30 17:53:55 2019 -0800

    hv_balloon: use generic_online_page()
    
    Let's use the generic onlining function - which will now also take care
    of calling kernel_map_pages().

However, the old code ended up calling
	__free_pages_core() -> __free_pages()
End the new one ends up calling
	__online_page_free() -> __free_reserved_page() -> __free_page()
So I don't think it's related to that.


Especially, looking at the kernel messages, I can see that the kernel crashes
when adding memory, not when onlining it? So I do think there is still
something wrong in the SPARSE hot-add code if you keep seeing issues.

-- 
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux