On 20.12.18 14:08, Michal Hocko wrote: > On Thu 20-12-18 13:58:16, David Hildenbrand wrote: >> On 30.11.18 18:59, David Hildenbrand wrote: >>> This is the second approach, introducing more meaningful memory block >>> types and not changing online behavior in the kernel. It is based on >>> latest linux-next. >>> >>> As we found out during dicussion, user space should always handle onlining >>> of memory, in any case. However in order to make smart decisions in user >>> space about if and how to online memory, we have to export more information >>> about memory blocks. This way, we can formulate rules in user space. >>> >>> One such information is the type of memory block we are talking about. >>> This helps to answer some questions like: >>> - Does this memory block belong to a DIMM? >>> - Can this DIMM theoretically ever be unplugged again? >>> - Was this memory added by a balloon driver that will rely on balloon >>> inflation to remove chunks of that memory again? Which zone is advised? >>> - Is this special standby memory on s390x that is usually not automatically >>> onlined? >>> >>> And in short it helps to answer to some extend (excluding zone imbalances) >>> - Should I online this memory block? >>> - To which zone should I online this memory block? >>> ... of course special use cases will result in different anwers. But that's >>> why user space has control of onlining memory. >>> >>> More details can be found in Patch 1 and Patch 3. >>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x. >>> >>> >>> Example: >>> $ udevadm info -q all -a /sys/devices/system/memory/memory0 >>> KERNEL=="memory0" >>> SUBSYSTEM=="memory" >>> DRIVER=="" >>> ATTR{online}=="1" >>> ATTR{phys_device}=="0" >>> ATTR{phys_index}=="00000000" >>> ATTR{removable}=="0" >>> ATTR{state}=="online" >>> ATTR{type}=="boot" >>> ATTR{valid_zones}=="none" >>> $ udevadm info -q all -a /sys/devices/system/memory/memory90 >>> KERNEL=="memory90" >>> SUBSYSTEM=="memory" >>> DRIVER=="" >>> ATTR{online}=="1" >>> ATTR{phys_device}=="0" >>> ATTR{phys_index}=="0000005a" >>> ATTR{removable}=="1" >>> ATTR{state}=="online" >>> ATTR{type}=="dimm" >>> ATTR{valid_zones}=="Normal" >>> >>> >>> RFC -> RFCv2: >>> - Now also taking care of PPC (somehow missed it :/ ) >>> - Split the series up to some degree (some ideas on how to split up patch 3 >>> would be very welcome) >>> - Introduce more memory block types. Turns out abstracting too much was >>> rather confusing and not helpful. Properly document them. >>> >>> Notes: >>> - I wanted to convert the enum of types into a named enum but this >>> provoked all kinds of different errors. For now, I am doing it just like >>> the other types (e.g. online_type) we are using in that context. >>> - The "removable" property should never have been named like that. It >>> should have been "offlinable". Can we still rename that? E.g. boot memory >>> is sometimes marked as removable ... >>> >> >> >> Any feedback regarding the suggested block types would be very much >> appreciated! > > I still do not like this much to be honest. I just didn't get to think > through this properly. My fear is that this is conflating an actual API > with the current implementation and as such will cause problems in > future. But I haven't really looked into your patches closely so I might > be wrong. Anyway I won't be able to look into it by the end of year. > I guess as long as we have memory block devices and we expect user space to make a decision we will have this API and the involved problems. I am open for alternatives, and as I said, any feedback on how to sort this out will be highly appreciated. I'll be on vacation for the next two weeks, so this can wait. Just wanted to note that I am still interested in feedback :) -- Thanks, David / dhildenb