On Mon, May 23, 2016 at 07:06:17PM +0200, Steinar H. Gunderson wrote: > On Mon, May 23, 2016 at 05:24:55PM +0100, Mark Brown wrote: > >>> Cc-ing Mark in case he has any insights (I hope I have the right email > >>> address). > > But nobody who works on probe deferral or made any of the suggestions I > > mentioned in the message you linked to, nor anyone who works on the > > driver you've identified a bug in... :( > As for the former, I have no idea who they would be, sorry. For the latter, There's the people I mentioned in the e-mail you linked to for exmaple... you could also look at git annotate for the probe deferral code and see who worked on it, or do whatever it is lead to you finding me. > isn't linux-samsung-soc@ the right place? MAINTAINERS said it was. That's just the list, not people. You really want to see who's working on the driver and talk to them, you can't guarantee that everyone reads the lists sufficiently well that they will notice some random post that doesn't even mention the driver in the subject line (it's probably a good idea to submit that patch BTW). Note also that if MAINTAINERS has multiple hits you want to pay attention to them. > Then, there's the issue of why the messages come for each deferred probe > attempt. It seems from your message this is about something in the > declaration of the device tree; I don't understand the nuances here, but I > suppose it's pretty easy? No. This is nothing to do with the device tree. The dwc3-exynos driver is continually creating new, partially specified devices. Since each of these new devices has the same problem they all get the same warning reported for them. The regulator API has no idea that this is anything to do with deferred probe, it's printing a warning message because something asked for a supply that not mapped at all which is a potential concern if we explode later due to this being an error in the code. All the regulator API can realistically do is remove all diagnostics in case someone triggers them with deferred probe but that's probably not enormously helpful to people if they run into an error directly triggered at the regulator level who might like some diagnostics to help them figure out what went wrong. > Fourth, there's the question of why there are thousands of probe attempts; > it shouldn't be even if the regulator takes a long time to come up. I guess > this is what your original message was about, and fixing that would also > reduce the amount of logging a ton (plus presumably speed up boot by wasting > less CPU on repeated probing). But as I understand you, it's not strictly > necessary to actually fix this issue? No, this is the way probe deferral is supposed to work. It is a bit wasteful of CPU but realistically it's not that much and it's really simple to implement robustly unlike anything that involves actually looking at dependencies. Some breakage in your system is triggering vastly more retries than are normal, even a hundred rounds would be *extremely* high for the initial boot. We're doing repeated probes because the way deferred probe works is that every time a new device binds we start a new retry of all deferred devices to see if that new device unblocked anything (if multiple devices progress in one run it should only schedule one new run). The reason you're seeing the spectacular volume is that every time we retry we add more devices (notice that you're not complaining about the log message generated by the underlying Designware controller failing to get the supply it requests which will print once per probe but rather by the PHY devices it spams in). The fact that the Exynos driver is instantiating subdevices before it's even acquired its own resources is probably not helping here of course, it's more than likely that at least some of those resources need to be passed on to the child devices and of course if the children did actually instantiate that'd trigger continual probe deferral runs. Looking at the code I've got a horrible feeling that might be the root cause here, it's a single regulator that's being requested so the diagnostics should be being printed in the caller but that just silently passes back failures to get supplies (which is why it wasn't immediately obvious that that was failing). > > The patch you linked to was for a completely different error message > > which is at least related to probe deferral > Yes, it's a different error message, but points to the same issue as #4 > above, no? Not from the point of the view of the regulator API and really as far as I can tell this is just a driver specific issue. The regulator API has no idea this is anything to do with deferred probe. > > though fundamentally unless > > we just stop printing diagnostics (which is getting more and more > > tempting to be honest) I'm not sure anything is going to fully resolve > > leaks like this, the best chance you've got is something that explicitly > > looks at the dependencies like Raphael was proposing. > What do you mean by “leaks” here? Resources that are not cleaned up are said to be leaks, as you identifed the dwc3-exynos code isn't cleaning up the child devices it creates.
Attachment:
signature.asc
Description: PGP signature