On Mon, 2019-07-29 at 10:06 +0200, David Hildenbrand wrote: > > > Of course, other interfaces might make sense. > > > > > > You can then start using these memory blocks and hinder them from > > > getting onlined (as a safety net) via memory notifiers. > > > > > > That would at least avoid you having to call > > > add_memory/remove_memory/offline_pages/device_online/modifying > > > memblock > > > states manually. > > > > I see what you're saying and that definitely sounds safer. > > > > We would still need to call remove_memory and add_memory from > > memtrace > > as > > just offlining memory doesn't remove it from the linear page tables > > (if > > it's still in the page tables then hardware can prefetch it and if > > hardware tracing is using it then the box checkstops). > > That prefetching part is interesting (and nasty as well). If we could > at > least get rid of the manual onlining/offlining, I would be able to > sleep > better at night ;) One step at a time. > Ok, I'll get to that soon :) > > > (binding the memory block devices to a driver would be nicer, but > > > the > > > infrastructure is not really there yet - we have no such drivers > > > in > > > place yet) > > > > > > > I don't know the mm code nor how the notifiers work very well > > > > so I > > > > can't quite see how the above would work. I'm assuming memtrace > > > > would > > > > register a hotplug notifier and when memory is offlined from > > > > userspace, > > > > the callback func in memtrace would be called if the priority > > > > was > > > > high > > > > enough? But how do we know that the memory being offlined is > > > > intended > > > > for usto touch? Is there a way to offline memory from userspace > > > > not > > > > using sysfs or have I missed something in the sysfs interface? > > > > > > The notifier would really only be used to hinder onlining as a > > > safety > > > net. User space prepares (offlines) the memory blocks and then > > > tells > > > the > > > drivers which memory blocks to use. > > > > > > > On a second read, perhaps you are assuming that memtrace is > > > > used > > > > after > > > > adding new memory at runtime? If so, that is not the case. If > > > > not, > > > > then > > > > would you be able to clarify what I'm not seeing? > > > > > > The main problem I see is that you are calling > > > add_memory/remove_memory() on memory your device driver doesn't > > > own. > > > It > > > could reside on a DIMM if I am not mistaking (or later on > > > paravirtualized memory devices like virtio-mem if I ever get to > > > implement them ;) ). > > > > This is just for baremetal/powernv so shouldn't affect virtual > > memory > > devices. > > Good to now. > > > > How is it guaranteed that the memory you are allocating does not > > > reside > > > on a DIMM for example added via add_memory() by the ACPI driver? > > > > Good point. We don't have ACPI on powernv but currently this would > > try > > to remove memory from any online memory node, not just the ones > > that > > are backed by RAM. oops. > > Okay, so essentially no memory hotplug/unplug along with memtrace. > (can > we document that somewhere?). I think > add_memory()/try_remove_memory() > could be tolerable in these environments (as it's only boot memory). > Sure thing.