On Tue 07-03-17 13:40:04, Igor Mammedov wrote: > On Mon, 6 Mar 2017 15:54:17 +0100 > Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > On Fri 03-03-17 18:34:22, Igor Mammedov wrote: [...] > > > in current mainline kernel it triggers following code path: > > > > > > online_pages() > > > ... > > > if (online_type == MMOP_ONLINE_KERNEL) { > > > if (!zone_can_shift(pfn, nr_pages, ZONE_NORMAL, &zone_shift)) > > > return -EINVAL; > > > > Are you sure? I would expect MMOP_ONLINE_MOVABLE here > pretty much, reproducer is above so try and see for yourself I will play with this... [...] > > > get_maintainer.pl doesn't lists linux-api for 31bc3858ea3e, > > > MAINTAINERS should be fixed if linux-api were to be CCed. > > > > user visible APIs _should_ be discussed at this mailing list regardless > > what get_maintainer.pl says. This is not about who is the maintainer but > > about getting as wide audience for things that would have to be > > maintained basically for ever. > > How would random contributor know which list to CC? This should have been brought up during the review process which was less than sufficient in this case. > > > > So unless this causes a major regression which would be hard to fix I > > > > will submit the patch for inclusion. > > > it will be a major regression due to lack of daemon that > > > could online fast and can't be killed on OOM. So this > > > clean up patch does break used feature without providing > > > a viable alternative. > > > > So let's discuss the current memory hotplug shortcomings and get rid of > > the crud which developed on top. I will start by splitting up the patch > > into 3 parts. Do the auto online thing from the HyperV and xen balloning > > drivers and dropping the config option and finally drop the sysfs knob. > > The last patch might be NAKed and I can live with that as long as the > > reasoning is proper and there is a general consensus on that. > PS: CC me on that patches too > > It's major regression if you remove auto online in kernels that > run on top of x86 kvm/vmware hypervisors, making API cleanups > while breaking useful functionality doesn't make sense. > > I would ACK config option removal if auto online keeps working > for all x86 hypervisors (hyperv/xen isn't the only who needs it) > and keep kernel CLI option to override default. > > That doesn't mean that others will agree with flipping default, > that's why config option has been added. > > Now to sum up what's been discussed on this thread, there were 2 > different issues discussed: > 1) memory hotplug: remove in kernel auto online for all > except of hyperv/xen > > - suggested RFC is not acceptable from virt point of view > as it regresses guests on top of x86 kvm/vmware which > both use ACPI based memory hotplug. > > - udev/userspace solution doesn't work in practice as it's > too slow and unreliable when system is under load which > is quite common in virt usecase. That's why auto online > has been introduced in the first place. Please try to be more specific why "too slow" is a problem. Also how much slower are we talking about? > 2) memory unplug: online memory as movable > > - doesn't work currently with udev rule due to kernel > issues https://bugzilla.redhat.com/show_bug.cgi?id=1314306#c7 These should be fixed > - could be fixed both for in kernel auto online and udev > with following patch: > https://bugzilla.redhat.com/attachment.cgi?id=1146332 > but fixing it this way exposes zone disbalance issues, > which are not present in current kernel as blocks are > onlined in Zone Normal. So this is area to work and > improve on. > > - currently if one wants to use online_movable, > one has to either > * disable auto online in kernel OR which might not just work because an unmovable allocation could have made the memblock pinned. > * remove udev rule that distro ships > AND write custom daemon that will be able to online > block in right zone/order. So currently whole > online_movable thing isn't working by default > regardless of who onlines memory. my epxperience with onlining full nodes as movable shows this works just fine (with all the limitations of the movable zones but that is a separate thing). I haven't played with configurations where movable zones are sharing the node with other zones. > I'm in favor of implementing that in kernel as it keeps > kernel internals inside kernel and doesn't need > kernel API to be involved (memory blocks in sysfs, > online_kernel, online_movable) > There would be no need in userspace which would have to > deal with kernel zoo and maintain that as well. The kernel is supposed to provide a proper API and that is sysfs currently. I am not entirely happy about it either but pulling a lot of code into the kernel is not the rigth thing to do. Especially when different usecases require different treatment. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html