On 31.07.19 04:21, Rashmica Gupta wrote: > On Mon, 2019-07-29 at 10:06 +0200, David Hildenbrand wrote: >>>> Of course, other interfaces might make sense. >>>> >>>> You can then start using these memory blocks and hinder them from >>>> getting onlined (as a safety net) via memory notifiers. >>>> >>>> That would at least avoid you having to call >>>> add_memory/remove_memory/offline_pages/device_online/modifying >>>> memblock >>>> states manually. >>> >>> I see what you're saying and that definitely sounds safer. >>> >>> We would still need to call remove_memory and add_memory from >>> memtrace >>> as >>> just offlining memory doesn't remove it from the linear page tables >>> (if >>> it's still in the page tables then hardware can prefetch it and if >>> hardware tracing is using it then the box checkstops). >> >> That prefetching part is interesting (and nasty as well). If we could >> at >> least get rid of the manual onlining/offlining, I would be able to >> sleep >> better at night ;) One step at a time. >> > > What are your thoughts on adding remove to state_store in > drivers/base/memory.c? And an accompanying add? So then userspace could > do "echo remove > memory34/state"? I consider such an interface dangerous (removing random memory you don't own, especially in the context of Oscars work some blocks would suddenly not be removable again) and only of limited used. The requirement of memtrace is really special, and it only works somewhat reliably with boot memory. FWIW, we do have a "probe" interface on some architectures but decided that a "remove" interface belongs into a debug/test kernel module. The memory block device API is already messed up enough (e.g., "removable" property which doesn't indicate at all if memory can be removed, only if it could be offlined) - we decided to keep changes at a minimum for now - rather clean up than add new stuff. > Then most of the memtrace code could be moved to a userspace tool. The > only bit that we would need to keep in the kernel is setting up debugfs > files in memtrace_init_debugfs. The nice thing is, with remove_memory() you will now get an error in case all memory blocks are not offline yet. So it only boils down to calling add_memory()/remove_memory() without even checking the state of the blocks. (only nids have to be handled manually correctly) -- Thanks, David / dhildenb