Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

"Rafael J. Wysocki" <rjw@xxxxxxx> · Thu, 29 Nov 2012 17:57:44 +0100

On Thursday, November 29, 2012 12:30:30 PM Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > 
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> > 
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> > 
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> >     it's safe to physically remove the device after the .remove().
> > 
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> > 
> > [By the way, do you know where we free the memory allocated for struct
> >  acpi_device objects?]
> > 
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
> 
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued. 
> 
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)
> 
> > 
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
> 
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?
> 
> > 
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> > 
> 
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).

Why so?  What prevents us from doing a bus scan again and binding the driver
again to the device?  Is .remove() doing something to the firmware?

Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html