On Tue, Sep 21, 2021 at 10:07 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > Sorry I've been busy with LPC and some other stuff and could respond earlier. > > On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@xxxxxxx> wrote: > > > > > It works at a device level, so it doesn't know about resources. The > > > only information it has is of the "this device may depend on that > > > other device" type and it uses that information to figure out a usable > > > probe ordering for drivers. > > > > And that simplification is the problem. A phandle does not point to a > > device, it points to a resource of a device. It should really be doing > > what the driver would do, follow the phandle to the resource and see > > if it exists yet. If it does not exist then yes it can defer the > > probe. If the resource does exist, allow the driver to probe. > > > > > Also if the probe has already started, it may still return > > > -EPROBE_DEFER at any time in theory > > > > Sure it can, and does. And any driver which is not broken will > > unregister its resources on the error path. And that causes users of > > the resources to release them. It all nicely unravels, and then tries > > again later. This all works, it is what these drivers do. > > One of the points of fw_devlink=on is to avoid the pointless deferred > probes that'd happen in this situation. So saying "let this happen" > when fw_devlink=on kinda beats the point of it. See further below. Well, you need to define "pointless deferred probes" in the first place. fw_devlink adds deferred probes by itself, so why are those not pointless whereas the others are? > > > > > However, making children wait for their parents to complete probing is > > > generally artificial, especially in the cases when the children are > > > registered by the parent's driver. So waiting should be an exception > > > in these cases, not a rule. > > Rafael, > > There are cases where the children try to probe too quickly (before > the parent has had time to set up all the resources it's setting up) > and the child defers the probe. Even Andrew had an example of that > with some ethernet driver where the deferred probe is attempted > multiple times wasting time and then it eventually succeeds. You seem to be arguing that it may be possible to replace multiple probe attempts that each are deferred with one probe deferral which then is beneficial from the performance perspective. Yes, there are cases like that, but when this is used as a general rule, it introduces a problem if it does a deferred probe when there is no need for a probe deferral at all (like in the specific problem case at hand). Also if the probing of the child is deferred just once, adding an extra dependency on the parent to it doesn't really help. > Considering there's no guarantee that a device_add() will result in > the device being bound immediately, why shouldn't we make the child > device wait until the parent has completely probed and we know all the > resources from the parent are guaranteed to be available? Why can't we > treat drivers that assume a device will get bound as soon as it's > added as the exception (because we don't guarantee that anyway)? Because this adds artificial constraints that otherwise aren't there in some cases to the picture and asking drivers to mark themselves as "please don't add these artificial constraints for me" is not particularly straightforward. Moreover, it does that retroactively causing things that are entirely correct and previously worked just fine to now have to paint themselves red to continue working as before. The fact that immediate probe completion cannot be guaranteed in general doesn't mean that it cannot be assumed in certain situations. For example, a parent driver registering a child may know what the child driver is and so it may know that the child will either probe successfully right away or the probing of it will fail and your extra constraint breaks that assumption. You can't really know how many of such cases there are and trying to cover them with a new flag is a retroactive whack-a-mole game. > Also, this assumption that the child will be bound successfully upon > addition forces the parent/child drivers to play initcall chicken -- > the child's driver has to be registered before the parent's driver. That's true, but why is this a general problem? For example, they both may be registered by the same function in the right order. What's wrong with that? > We should be getting away from those by fixing the parent driver that's > making these assumptions (I'll be glad to help with that). We need to > be moving towards reducing pointless deferred probes and initcall > ordering requirements instead of saying "this bad assumption used to > work, so allow me to continue doing that". It is not always a bad assumption. It may be code designed this way.