On 02.07.19 08:42, Rashmica Gupta wrote: > Hi David, > > Sorry for the late reply. Hi, sorry I was on PTO :) > > On Wed, 2019-06-26 at 10:28 +0200, David Hildenbrand wrote: >> On 26.06.19 10:15, Oscar Salvador wrote: >>> On Wed, Jun 26, 2019 at 10:11:06AM +0200, David Hildenbrand wrote: >>>> Back then, I already mentioned that we might have some users that >>>> remove_memory() they never added in a granularity it wasn't >>>> added. My >>>> concerns back then were never fully sorted out. >>>> >>>> arch/powerpc/platforms/powernv/memtrace.c >>>> >>>> - Will remove memory in memory block size chunks it never added >>>> - What if that memory resides on a DIMM added via >>>> MHP_MEMMAP_DEVICE? >>>> >>>> Will it at least bail out? Or simply break? >>>> >>>> IOW: I am not yet 100% convinced that MHP_MEMMAP_DEVICE is save >>>> to be >>>> introduced. >>> >>> Uhm, I will take a closer look and see if I can clear your >>> concerns. >>> TBH, I did not try to use arch/powerpc/platforms/powernv/memtrace.c >>> yet. >>> >>> I will get back to you once I tried it out. >>> >> >> BTW, I consider the code in arch/powerpc/platforms/powernv/memtrace.c >> very ugly and dangerous. > > Yes it would be nice to clean this up. > >> We should never allow to manually >> offline/online pages / hack into memory block states. >> >> What I would want to see here is rather: >> >> 1. User space offlines the blocks to be used >> 2. memtrace installs a hotplug notifier and hinders the blocks it >> wants >> to use from getting onlined. >> 3. memory is not added/removed/onlined/offlined in memtrace code. >> > > I remember looking into doing it a similar way. I can't recall the > details but my issue was probably 'how does userspace indicate to > the kernel that this memory being offlined should be removed'? Instead of indicating a "size", indicate the offline memory blocks that the driver should use. E.g. by memory block id for each node 0:20-24,1:30-32 Of course, other interfaces might make sense. You can then start using these memory blocks and hinder them from getting onlined (as a safety net) via memory notifiers. That would at least avoid you having to call add_memory/remove_memory/offline_pages/device_online/modifying memblock states manually. (binding the memory block devices to a driver would be nicer, but the infrastructure is not really there yet - we have no such drivers in place yet) > > I don't know the mm code nor how the notifiers work very well so I > can't quite see how the above would work. I'm assuming memtrace would > register a hotplug notifier and when memory is offlined from userspace, > the callback func in memtrace would be called if the priority was high > enough? But how do we know that the memory being offlined is intended > for usto touch? Is there a way to offline memory from userspace not > using sysfs or have I missed something in the sysfs interface? The notifier would really only be used to hinder onlining as a safety net. User space prepares (offlines) the memory blocks and then tells the drivers which memory blocks to use. > > On a second read, perhaps you are assuming that memtrace is used after > adding new memory at runtime? If so, that is not the case. If not, then > would you be able to clarify what I'm not seeing? The main problem I see is that you are calling add_memory/remove_memory() on memory your device driver doesn't own. It could reside on a DIMM if I am not mistaking (or later on paravirtualized memory devices like virtio-mem if I ever get to implement them ;) ). How is it guaranteed that the memory you are allocating does not reside on a DIMM for example added via add_memory() by the ACPI driver? -- Thanks, David / dhildenb