On Mon, Jul 30, 2018 at 8:11 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 30.07.2018 14:05, Michal Hocko wrote: > > On Mon 30-07-18 13:53:06, David Hildenbrand wrote: > >> On 30.07.2018 13:30, Michal Hocko wrote: > >>> On Fri 27-07-18 18:54:54, David Hildenbrand wrote: > >>>> Right now, struct pages are inititalized when memory is onlined, not > >>>> when it is added (since commit d0dc12e86b31 ("mm/memory_hotplug: optimize > >>>> memory hotplug")). > >>>> > >>>> remove_memory() will call arch_remove_memory(). Here, we usually access > >>>> the struct page to get the zone of the pages. > >>>> > >>>> So effectively, we access stale struct pages in case we remove memory that > >>>> was never onlined. So let's simply inititalize them earlier, when the > >>>> memory is added. We only have to take care of updating the zone once we > >>>> know it. We can use a dummy zone for that purpose. > >>> > >>> I have considered something like this when I was reworking memory > >>> hotplug to not associate struct pages with zone before onlining and I > >>> considered this to be rather fragile. I would really not like to get > >>> back to that again if possible. > >>> > >>>> So effectively, all pages will already be initialized and set to > >>>> reserved after memory was added but before it was onlined (and even the > >>>> memblock is added). We only inititalize pages once, to not degrade > >>>> performance. > >>> > >>> To be honest, I would rather see d0dc12e86b31 reverted. It is late in > >>> the release cycle and if the patch is buggy then it should be reverted > >>> rather than worked around. I found the optimization not really > >>> convincing back then and this is still the case TBH. > >>> > >> > >> If I am not wrong, that's already broken in 4.17, no? What about that? > > > > Ohh, I thought this was merged in 4.18. > > $ git describe --contains d0dc12e86b31 --match="v*" > > v4.17-rc1~99^2~44 > > > > proves me wrong. This means that the fix is not so urgent as I thought. > > If you can figure out a reasonable fix then it should be preferable to > > the revert. > > > > Fake zone sounds too hackish to me though. > > > > If I am not wrong, that's the same we had before d0dc12e86b31 but now it > is explicit and only one single value for all kernel configs > ("ZONE_NORMAL"). > > Before d0dc12e86b31, struct pages were initialized to 0. So it was > (depending on the config) ZONE_DMA, ZONE_DMA32 or ZONE_NORMAL. > > Now the value is random and might not even be a valid zone. Hi David, Have you figured out why we access struct pages during hot-unplug for offlined memory? Also, a panic trace would be useful in the patch. As I understand the bug may occur only when hotremove is enabled, and default onlining of added memory is disabled. Is this correct? I suspect the reason we have not heard about this bug is that it is rare to add memory and not to online it. Thank you, Pavel