On Thu, Jul 25, 2019 at 10:49 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 25.07.19 21:19, Michal Hocko wrote: > > On Thu 25-07-19 16:35:07, David Hildenbrand wrote: > >> On 25.07.19 15:57, Michal Hocko wrote: > >>> On Thu 25-07-19 15:05:02, David Hildenbrand wrote: > >>>> On 25.07.19 14:56, Michal Hocko wrote: > >>>>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote: > >>>>>> We end up calling __add_memory() without the device hotplug lock held. > >>>>>> (I used a local patch to assert in __add_memory() that the > >>>>>> device_hotplug_lock is held - I might upstream that as well soon) > >>>>>> > >>>>>> [ 26.771684] create_memory_block_devices+0xa4/0x140 > >>>>>> [ 26.772952] add_memory_resource+0xde/0x200 > >>>>>> [ 26.773987] __add_memory+0x6e/0xa0 > >>>>>> [ 26.775161] acpi_memory_device_add+0x149/0x2b0 > >>>>>> [ 26.776263] acpi_bus_attach+0xf1/0x1f0 > >>>>>> [ 26.777247] acpi_bus_attach+0x66/0x1f0 > >>>>>> [ 26.778268] acpi_bus_attach+0x66/0x1f0 > >>>>>> [ 26.779073] acpi_bus_attach+0x66/0x1f0 > >>>>>> [ 26.780143] acpi_bus_scan+0x3e/0x90 > >>>>>> [ 26.780844] acpi_scan_init+0x109/0x257 > >>>>>> [ 26.781638] acpi_init+0x2ab/0x30d > >>>>>> [ 26.782248] do_one_initcall+0x58/0x2cf > >>>>>> [ 26.783181] kernel_init_freeable+0x1bd/0x247 > >>>>>> [ 26.784345] kernel_init+0x5/0xf1 > >>>>>> [ 26.785314] ret_from_fork+0x3a/0x50 > >>>>>> > >>>>>> So perform the locking just like in acpi_device_hotplug(). > >>>>> > >>>>> While playing with the device_hotplug_lock, can we actually document > >>>>> what it is protecting please? I have a bad feeling that we are adding > >>>>> this lock just because some other code path does rather than with a good > >>>>> idea why it is needed. This patch just confirms that. What exactly does > >>>>> the lock protect from here in an early boot stage. > >>>> > >>>> We have plenty of documentation already > >>>> > >>>> mm/memory_hotplug.c > >>>> > >>>> git grep -C5 device_hotplug mm/memory_hotplug.c > >>>> > >>>> Also see > >>>> > >>>> Documentation/core-api/memory-hotplug.rst > >>> > >>> OK, fair enough. I was more pointing to a documentation right there > >>> where the lock is declared because that is the place where people > >>> usually check for documentation. The core-api documentation looks quite > >>> nice. And based on that doc it seems that this patch is actually not > >>> needed because neither the online/offline or cpu hotplug should be > >>> possible that early unless I am missing something. > >> > >> I really prefer to stick to locking rules as outlined on the > >> interfaces if it doesn't hurt. Why it is not needed is not clear. > >> > >>> > >>>> Regarding the early stage: primarily lockdep as I mentioned. > >>> > >>> Could you add a lockdep splat that would be fixed by this patch to the > >>> changelog for reference? > >>> > >> > >> I have one where I enforce what's documented (but that's of course not > >> upstream and therefore not "real" yet) > > > > Then I suppose to not add locking for something that is not a problem. > > Really, think about it. People will look at this code and follow the > > lead without really knowing why the locking is needed. > > device_hotplug_lock has its purpose and if the code in question doesn't > > need synchronization for the documented scenarios then the locking > > simply shouldn't be there. Adding the lock just because of a > > non-existing, and IMHO dubious, lockdep splats is just wrong. > > > > We need to rationalize the locking here, not to add more hacks. > > No, sorry. The real hack is calling a function that is *documented* to > be called under lock without it. That is an optimization for a special > case. That is the black magic in the code. > > The only alternative I see to this patch is adding a comment like > > /* > * We end up calling __add_memory() without the device_hotplug_lock > * held. This is fine as we cannot race with other hotplug activities > * and userspace trying to online memory blocks. > */ > > Personally, I don't think that's any better than just grabbing the lock > as we are told to. (honestly, I don't see how optimizing away the lock > here is of *any* help to optimize our overall memory hotplug locking) > > @Rafael, what's your take? lock or comment? Well, I have ACKed your patch already. :-) That said, adding a comment stating that the lock is acquired mostly for consistency wouldn't hurt.