On Saturday, January 26, 2013 02:49:30 AM Rafael J. Wysocki wrote: > On Friday, January 25, 2013 04:07:38 PM Toshi Kani wrote: > > On Fri, 2013-01-25 at 23:11 +0100, Rafael J. Wysocki wrote: > > > On Friday, January 25, 2013 09:52:21 AM Toshi Kani wrote: > > > > On Thu, 2013-01-24 at 01:26 +0100, Rafael J. Wysocki wrote: > > : > > > > > > > > > > I wonder if anyone is seeing any major problems with this at the high level. > > > > > > First of all, thanks for the response. :-) > > > > > > > I agree that the current model is mess. As shown below, it requires > > > > that .add() at boot-time only performs acpi dev init, and .add() at > > > > hot-add needs both acpi dev init and device on-lining. > > > > > > I'm not sure what you're talking about, though. > > > > > > You seem to be confusing ACPI device nodes (i.e. things represented by struct > > > acpi_device objects) with devices, but they are different things. They are > > > just used to store static information extracted from device objects in the > > > ACPI namespace and to expose those objects (and possibly some of their > > > properties) via sysfs. Device objects in the ACPI namespace are not devices, > > > however, and they don't even need to represent devices (for example, the > > > _SB thing, which is represented by struct acpi_device, is hardly a device). > > > > > > So the role of struct acpi_device things is analogous to the role of > > > struct device_node things in the Device Trees world. In fact, no drivers > > > should ever bind to them and in my opinion it was a grievous mistake to > > > let them do that. But I'm digressing. > > > > > > So, when you're saying "acpi dev", I'm not sure if you think about a device node > > > or a device (possibly) represented by that node. If you mean device node, then > > > I'm not sure what "acpi dev init" means, because device nodes by definition > > > don't require any initialization beyond what acpi_add_single_object() does > > > (and they don't require any off-lining beyod what acpi_device_unregister() > > > does, for that matter). In turn, if you mean "device represented by the given > > > device node", then you can't even say "ACPI device" about it, because it very > > > well may be a PCI device, or a USB device, or a SATA device etc. > > > > Let me clarify my point with the ACPI memory driver as an example since > > it is the one that has caused a problem in .remove(). > > > > acpi_memory_device_add() implements .add() and does two things below. > > > > 1. Call _CRS and initialize a list of struct acpi_memory_info that is > > attached to acpi_device->driver_data. This step is what I described as > > "acpi dev init". ACPI drivers perform driver-specific initialization to > > ACPI device objects. > > > > 2. Call add_memory() to add a target memory range to the mm module. > > This step is what I described as "on-lining". This step is not > > necessary at boot-time since the mm module has already on-lined the > > memory ranges at early boot-time. At hot-add, however, it needs to call > > add_memory() with the current framework. > > I see. > > OK, so that does handle the "struct acpi_device has been registered" event, > both on boot and hot-add. The interactions with mm are tricky, I agree, but > that's not what I want to address at this point. > > > Similarly, acpi_memory_device_remove() implements .remove() and does two > > things below. > > > > 1. Call remove_memory() to offline a target memory range. This step, > > "off-lining", can fail since the mm module may or may not be able to > > delete non-movable ranges. This failure cannot be handled properly and > > causes the system to crash at this point. > > Well, if the system administrator wants to crash the system this way, it's > basically up to him. So that should be done by .detach() anyway in that case. > > > 2. Free up the list of struct acpi_memory_info. This step deletes > > driver-specific data from an ACPI device object. > > OK > > > > That's part of the whole confusion, by the way. > > > > > > If the device represented by an ACPI device node is on a natively enumerated > > > bus, like PCI, then its native bus' init code initializes the device and > > > creates a "physical" device object for it, like struct pci_dev, which is then > > > "glued" to the corresponding struct acpi_device by acpi_bind_one(). Then, it > > > is clear which is which and there's no confusion. The confusion starts when > > > there's no native enumeration and we only have the struct acpi_device thing, > > > because then everybody seems to think "oh, there's no physical device object > > > now, so this must be something different", but the *only* difference is that > > > there is no native bus' init code now and we should still be creating a > > > "physical device" object for the device and we should "glue" it to the > > > existing struct acpi_device like in the natively enumerated case. > > > > > > > It then requires .remove() to perform both off-lining and acpi dev > > > > delete. .remove() must succeed, but off-lining can fail. > > > > > > > > acpi dev online > > > > |========|=========| > > > > > > > > add @ boot > > > > --------> > > > > add @ hot-add > > > > ------------------> > > > > <------------------ > > > > remove > > > > > > That assumes that the "driver" is present during boot (i.e. when acpi_bus_scan() > > > is run for the first time), but what if it is not? > > > > With memory's example, the mm module must be present at boot. The > > system does not boot without it. > > OK > > > > > Your proposal seems to introduce the following new model. If so, I do > > > > not think it addresses all the issues. > > > > > > It is not supposed to address *all* issues (whatever "all" means). It is meant > > > to address precisely *one* problem, which is the abuse of the driver core by > > > the ACPI subsystem (please see below). > > > > > > > .attach() still needs to behave differently between boot and hot-add. > > > > > > Why does it? I don't see any reason for that. > > > > With memory's example, calling add_memory() at boot is not necessary > > (which just fails and this failure cannot cause an error), but is > > necessary at hot-add (which should succeed in this case). > > But essentially it's not a bug to call add_memory() during boot too, but the > problem seems to be how to distinguish the benign failure when the memory range > has been accounted for already earlier. I suppose that add_memory() should > return error codes allowing you to tell the difference? > > > > > The model is also asymmetric since the destructor of .attach() at hot-add > > > > is the combination of .detach() and .untie(). > > > > . > > > > attach @ boot > > > > --------> > > > > attach @ hot-add > > > > -----------------> > > > > detach untie > > > > <-------<--------- > > > > ---------> > > > > reclaim > > > > > > > > I believe device on-lining and off-lining steps should not be performed > > > > in .add() and .remove(). With this clarification, the current .add() > > > > & .remove() model works fine as follows. That is, .add() only performs > > > > acpi dev init, and .remove() only perform acpi dev delete (which is same > > > > as your .detach()). My system device hot-plug framework is designed to > > > > work with this model. > > > > > > Well, if I understand the above correctly, you're basically saying that if we > > > add a layer on top of the ACPI subsystem, we can separate "online" from "add" > > > and "offline" from "remove" in such a way that the "add" and "remove" will be > > > handled by the ACPI subsystem and "online" and "offline" will be done by the > > > extra layer. > > > > Right. In memory's example, the "online" part should be done by the mm > > module itself. > > That still would be tricky, but I believe it is specific to the memory case, > because memory ranges don't need any special enumeration to happen to be seen > by the early boot code. The result of that is that the memory subsystem > doesn't know whether or not the given range is removable upfront. However, we > find that out from the ACPI namespace scan and we should be able to tell the > memory subsystem "this range is removable" somehow. So instead of add_memory() > there should be something like add_removable_memory_range() that would succeed > if the memory range has already been found. In that case it should just cause > the memory subsystem to record the fact that the given range has the capability > of being removed, which indeed may be good to know to it. > > > > That quite precisely is what we should be doing, but the "add" operation should > > > include the creation of a "physical device" object, like for example struct > > > platform_device, and the additional layer should be a proper driver (a platform > > > driver for example) that will bind to that "physical device" object and > > > initialize the device (i.e. hardware). > > > > > > Analogously, the "remove" operation should include the removal of the "physical > > > device" object from which the driver will have to be unbound first. > > > > Agreed. With memory's example, the "remove" is also required to do > > "off-lining" (i.e. call remove_memory), which should not be the role of > > ACPI driver. > > Well, it kind of has to be initiated by the ACPI subsystem, because that's > where the event happens. And I'm not talking about the event that the BIOS > signals, but an event that may happen as a result of an eject event from the > BIOS for *another* device node upper in the hierarchy (e.g. a processor > package). > > > > That I believe is what Greg meant when he was discussing your earlier proposal > > > with you. > > > > > > Now, however, the problem is what kind of a device object we should create > > > during the "add" phase (struct platform_device may not be suitable in some > > > cases) and whether that needs to be a single object or a whole bunch of them > > > (e.g. when the given struct acpi_device represents a bus or bridge, like in the > > > PCI host bridge case). That's what the ACPI scan handlers I'm proposing are > > > for. > > > > OK, so, we are thinking of different issues... :-) > > > > > So, an ACPI scan handler's .attach() is supposed to recognize what kind of > > > hardware is there to handle and to create whatever device objects (based on > > > struct device) are there to create etc. Then, there should be drivers that > > > will bind to those objects and so on. .detach(), in turn, is supposed to > > > reverse whatever .attach() has done. There is an additional complication, > > > though, that there may be an eject request between .attach() and .detach() > > > and it needs to be responded to. > > > > > > This really is about responding to three types of events related to the ACPI > > > namespace. Those events are, essentially: > > > > > > (1) Device node (i.e. struct acpi_device) has been registered. > > > (2) Eject has been requested for a device node. > > > (3) Device node goes away (i.e. it is going to be unregistered). > > > > > > Whatever the "model", we have to respond to the above events, this way or > > > another. > > > > > > Of course, (2) need not be the same as (3) in general, because one may envision > > > a refusal to carry out the eject. Currently, though, there is no distinction > > > between (2) and (3). > > > > > > The purpose of ACPI scan handlers I'm proposing is precisely to handle these > > > three types of events without abusing the driver core. How exactly they are > > > going to be handled will depend on the implementation of those handlers. > > > > > > The idea is that .attach(), .untie(), and .detach() will be called to handle > > > (1), (2), and (3), respectively, with the additional twist that after an eject > > > refusal .reclaim() needs to be called to do the cleanup. > > > > > > Well, perhaps the names .untie() and .reclaim() are not the best ones and it's > > > better to use names like .eject_requested() and .eject_refused() explicitly > > > for those callbacks? And analogously for the flag indicating that > > > .eject_requested() has succeeded for the given device? > > > > > > So, this is not about creating any new "model", it's just about doing what > > > needs to be done in a possibly straightforward way. > > > > > > Now, perhaps I should just post some code so that it's more clear what I mean. :-) > > > > Sounds like I did confuse completely! > > > > Anyway, even we have .untie() or .eject_requested(), I think all the > > hot-delete procedure may not be done within this function since an ACPI > > driver is not responsible for managing/controlling actual device. > > No, it is not, but it may propagate the event to the code that is responsible > for that. > > The problem is that sometimes the only way we learn that there's a request to > remove certain device is through the ACPI namespace. Suppose that there is > device object CONT in the namespace and there's another device object MEMR > that is a direct child of CONT. We have struct acpi_device objects for both. > > Now, say that we have an eject request for CONT. Obviously, we are supposed to > remove both struct acpi_device objects for CONT and MEMR, but we don't know > if the removal of MEMR is safe. To address this we need to ask the code that > handles the device represented by MEMR if removing it will be safe. That's > what .eject_requested() is supposed to be for. > > So the idea is that when the BIOS signals "eject" for CONT, the ACPI subsystem > will call handler->eject_requested() for all struct acpi_device objects below > and including the CONT's one. That call doesn't actually need to remove > anything, but it is supposed to (a) check if the removal will be safe and (b) > if so, make sure that that doesn't change after it has returned. If it can't > do both (a) and (b), it should return an error code and that will cause the > ACPI subsystem to fail the eject. > > In the case of memory, it may call something like "disable memory" at this > point that will try to move everything out of the memory range and mark it > as "don't use". That, if successful, will ensure both (a) and (b). No > physical removal, though, the memory module is still there. > > Now suppose that there are two memory modules under CONT in the ACPI namespace, > MEMA and MEMB. Again, the BIOS signals "eject" for CONT and say that > .eject_requested() calls "disable memory" for MEMA which succeeds and the > same is done for MEMB, but this time "disable memory" fails, so its > .eject_requested() returns an error code. > > In that case the ACPI namespace needs to tell the handler of MEMA that the > eject is not going to happen after all, so it may tell the memory subsystem > to re-enable the relevant memory range. This is the purpose of the > .eject_canceled() (previously .reclaim()) call. [If the re-enabling fails, > the memory range will be permanently disabled, but we only care so much as it > means that we can just safely remove that memory module going forward at any > time.] > > On the other hand, if .eject_requested() for both MEMA and MEMB succeed, > the ACPI subsystem can simply call acpi_bus_trim() for the whole subtree > starting at CONT. In that case .detach() callbacks will be executed for both > MEMA and MEMB and they will actually do the teardown of everything. > > Still, someone sometimes may want to force an eject of CONT, even though > that will crash the system outright. For this reason, it has to be possible > to do the acpi_bus_trim() for the CONT's subtree directly and that should still > remove everything without failing and that's where the "untied" flag may be > useful (I will call it eject_accepted in future). Namely, .detach() may > look at it and see whether or not .eject_requested() was called successfully > for its device and decide what to do on this basis. > > Thus the .eject_requested()/.eject_canceled() phase is kind of advisory for > situations in which we are allowed to fail an eject request and generally > .detach() is still supposed to be the reverse of .attach(). > > And again, those are simply events related to the ACPI namespace, > "struct acpi_device has been registered", "eject has been requested for a > struct acpi_device", and "unregister struct acpi_device now" that the ACPI > subsystem needs to react to and possibly propagate them to the upper code > layers (device drivers etc.). That made me think about the eject problem in general (that is, how to get information on whether or not the removal of a given set of devices is going to be safe) which lead to the realization that it really is not limited to memory and that the .eject_requested()/.eject_canceled() mechanism above may not be sufficient in general. I'll describe that in a separate message, though. As far as the scan handlers are concerned, it looks like I wanted to do too much at a time, because what I need for now is only things that will do .attach()/.detach() without involving the driver core. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html