On Mon, Jul 9, 2018 at 8:48 AM, Pingfan Liu <kernelfans@xxxxxxxxx> wrote: > On Sun, Jul 8, 2018 at 4:25 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: >> >> On Sat, Jul 7, 2018 at 6:24 AM, Pingfan Liu <kernelfans@xxxxxxxxx> wrote: >> > On Fri, Jul 6, 2018 at 9:55 PM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: >> >> >> >> On Fri, Jul 6, 2018 at 4:47 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: >> >> > >> >> > On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner <lukas@xxxxxxxxx> wrote: >> >> > > [cc += Kishon Vijay Abraham] >> >> > > >> >> > > On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote: >> >> > >> OK, so calling devices_kset_move_last() from really_probe() clearly is >> >> > >> a mistake. >> >> > >> >> >> > >> I'm not really sure what the intention of it was as the changelog of >> >> > >> commit 52cdbdd49853d doesn't really explain that (why would it be >> >> > >> insufficient without that change?) >> >> > > >> >> > > It seems 52cdbdd49853d fixed an issue with boards which have an MMC >> >> > > whose reset pin needs to be driven high on shutdown, lest the MMC >> >> > > won't be found on the next boot. >> >> > > >> >> > > The boards' devicetrees use a kludge wherein the reset pin is modelled >> >> > > as a regulator. The regulator is enabled when the MMC probes and >> >> > > disabled on driver unbind and shutdown. As a result, the pin is driven >> >> > > low on shutdown and the MMC is not found on the next boot. >> >> > > >> >> > > To fix this, another kludge was invented wherein the GPIO expander >> >> > > driving the reset pin unconditionally drives all its pins high on >> >> > > shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c >> >> > > (commit adc284755055, "gpio: pcf857x: restore the initial line state >> >> > > of all pcf lines"). >> >> > > >> >> > > For this kludge to work, the GPIO expander's ->shutdown hook needs to >> >> > > be executed after the MMC expander's ->shutdown hook. >> >> > > >> >> > > Commit 52cdbdd49853d achieved that by reordering devices_kset according >> >> > > to the probe order. Apparently the MMC probes after the GPIO expander, >> >> > > possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't >> >> > > available yet, see mmc_regulator_get_supply(). >> >> > > >> >> > > Note, I'm just piecing the information together from git history, >> >> > > I'm not responsible for these kludges. (I'm innocent!) >> >> > >> >> > Sure enough. :-) >> >> > >> >> > In any case, calling devices_kset_move_last() in really_probe() is >> >> > plain broken and if its only purpose was to address a single, arguably >> >> > kludgy, use case, let's just get rid of it in the first place IMO. >> >> > >> >> Yes, if it is only used for a single use case. >> >> >> > Think it again, I saw other potential issue with the current code. >> > device_link_add->device_reorder_to_tail() can break the >> > "supplier<-consumer" order. During moving children after parent's >> > supplier, it ignores the order of child's consumer. >> >> What do you mean? >> > The drivers use device_link_add() to build "supplier<-consumer" order > without knowing each other. Hence there is the following potential > odds: (consumerX, child_a, ...) (consumer_a,..) (supplierX), where > consumer_a consumes child_a. Well, what's the initial state of the list? > When device_link_add()->device_reorder_to_tail() moves all descendant of > consumerX to the tail, it breaks the "supplier<-consumer" order by > "consumer_a <- child_a". That depends on what the initial ordering of the list is and please note that circular dependencies are explicitly assumed to be not present. The assumption is that the initial ordering of the list reflects the correct suspend (or shutdown) order without the new link. Therefore initially all children are located after their parents and all known consumers are located after their suppliers. If a new link is added, the new consumer goes to the end of the list and all of its children and all of its consumers go after it. device_reorder_to_tail() is recursive, so for each of the devices that went to the end of the list, all of its children and all of its consumers go after it and so on. Now, that operation doesn't change the order of any of the parent<-child or supplier<-consumer pairs that get moved and since all of the devices that depend on any device that get moved go to the end of list after it, the only devices that don't go to the end of list are guaranteed to not depend on any of them (they may be parents or suppliers of the devices that go to the end of the list, but not their children or suppliers). > And we need recrusion to resolve the item in > (consumer_a,..), each time when moving a consumer behind its supplier, > we may break "parent<-child". I don't see this as per the above. Say, device_reorder_to_tail() moves a parent after its child. This means that device_reorder_to_tail() was not called for the child after it had been called for the parent, but that is not true, because it is called for all of the children of each device that gets moved *after* moving that device. >> > Beside this, essentially both devices_kset_move_after/_before() and >> > device_pm_move_after/_before() expose the shutdown order to the >> > indirect caller, and we can not expect that the caller can not handle >> > it correctly. It should be a job of drivers core. >> >> Arguably so, but that's how those functions were designed and the >> callers should be aware of the limitation. >> >> If they aren't, there is a bug in the caller. >> > If we consider device_move()-> device_pm_move_after/_before() more > carefully like the above description, then we can hide the detail from > caller. And keep the info of the pm order inside the core. Yes, we can. My point is that we have not been doing that so far and the current callers of those routines are expected to know that. We can do that to make the life of *future* callers easier (and maybe to simplify the current ones), but currently the caller is expected to do the right thing. >> > It is hard to extract high dimension info and pack them into one dimension >> > linked-list. >> >> Well, yes and no. >> > For "hard", I means that we need two interleaved recursion to make the > order correct. Otherwise, I think it is a bug or limitation. So the limitation is that circular dependencies may not exist, because if they did, there would be no suitable suspend/shutdown ordering between devices. >> We know it for a fact that there is a linear ordering that will work. >> It is inefficient to figure it out every time during system suspend >> and resume, for one and that's why we have dpm_list. >> > Yeah, I agree that iterating over device tree may hurt performance. I > guess the iterating will not cost the majority of the suspend time, > comparing to the device_suspend(), which causes hardware's sync. But > data is more persuasive. Besides the performance, do you have other > concern till now? I simply think that there should be one way to iterate over devices for both system-wide PM and shutdown. The reason why it is not like that today is because of the development history, but if it doesn't work and we want to fix it, let's just consolidate all of that. Now, system-wide suspend resume sometimes iterates the list in the reverse order which would be hard without having a list, wouldn't it? >> Now, if we have it for suspend and resume, it can also be used for shutdown. >> > Yes, I do think so. OK Thanks, Rafael