On 30.07.2018 16:50, Michal Hocko wrote: > On Mon 30-07-18 16:42:27, David Hildenbrand wrote: >> On 30.07.2018 16:10, Michal Hocko wrote: >>> On Mon 30-07-18 15:51:45, David Hildenbrand wrote: >>>> On 30.07.2018 15:30, Pavel Tatashin wrote: >>> [...] >>>>> Hi David, >>>>> >>>>> Have you figured out why we access struct pages during hot-unplug for >>>>> offlined memory? Also, a panic trace would be useful in the patch. >>>> >>>> __remove_pages() needs a zone as of now (e.g. to recalculate if the zone >>>> is contiguous). This zone is taken from the first page of memory to be >>>> removed. If the struct pages are uninitialized that value is random and >>>> we might even get an invalid zone. >>>> >>>> The zone is also used to locate pgdat. >>>> >>>> No stack trace available so far, I'm just reading the code and try to >>>> understand how this whole memory hotplug/unplug machinery works. >>> >>> Yes this is a mess (evolution of the code called otherwise ;) [1]. >> >> So I guess I should not feel bad if I am having problems understanding >> all the details? ;) >> >>> Functionality has been just added on top of not very well thought >>> through bases. This is a nice example of it. We are trying to get a zone >>> to 1) special case zone_device 2) recalculate zone state. The first >>> shouldn't be really needed because we should simply rely on altmap. >>> Whether it is used for zone device or not. 2) shouldn't be really needed >>> if the section is offline and we can check that trivially. >>> >> >> About 2, I am not sure if this is the case and that easy. To me it looks >> more like remove_pages() fixes up things that should be done in >> offline_pages(). Especially, if the same memory was onlined/offlined to >> different zones we might be in trouble (looking at code on a very high >> level view). > > Well, this might be possible. Hotplug remove path was on my todo list > for a long time. I didn't get that far TBH. shrink_zone_span begs for > some attention. > So i guess we agree that the right fix for this is to not touch struct pages when removing memory, correct? -- Thanks, David / dhildenb