Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > > > > > > Consider the following case:
> > > > > > > 
> > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > 
> > > > > > > CPUa                                                  CPUb
> > > > > > > acpi_memory_device_notify()
> > > > > > >                                        unbind it from the driver
> > > > > > >     acpi_bus_hot_remove_device()
> > > > > > 
> > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > unbind only?
> > > > > 
> > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > if it's not present (dev->driver == NULL).
> > > > > 
> > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > for other acpi devices?
> > > > > 
> > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > 
> > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > device_release_driver(), I am wondering if we can call
> > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > parent lock is otherwise released after device_release_driver() is done.
> > > 
> > > I would be careful.  You may introduce some subtle locking-related issues
> > > this way.
> > 
> > Right.  This requires careful inspection and testing.  As far as the
> > locking is concerned, I am not keen on using fine grained locking for
> > hot-plug.  It is much simpler and solid if we serialize such operations.
> > 
> > > Besides, there may be an alternative approach to all this.  For example,
> > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > associated with them don't go away in that case after all, do they?
> > 
> > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > to be able to fail a request when memory range cannot be off-lined.
> > Otherwise, we end up ejecting online memory range.
> 
> Yes, this is the major one.  The minor issue, however, is a race condition
> between unbinding a driver from a device and removing the device if I
> understand it correctly.  Which will go away automatically if the device is
> not removed in the first place.  Or so I would think. :-)

I see.  I do not think whether or not the device is removed on eject
makes any difference here.  The issue is that after driver_unbind() is
done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
driver (hence, it cannot fail in prepare_remove), and goes ahead to call
_EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
it cannot off-line kernel memory ranges.  So, we basically need to
either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
during the operation.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]