On 2022-03-24 19:09, Vladimir Oltean wrote:
On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote:
I was just raising this as what I thought would be a simple and
non-controversial counter example to your remark "If you change something,
you *must* guarantee forward *and* backward compatibility."
If you change something *in the binding*, which was implicit in the
context, and makes no sense out of context.
Practically speaking, what has happened is that the board DT appeared in
kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to
enable PHY interrupts in kernel N+2. That DT update practically broke
kernel N from running correctly on DTs taken from kernel N+2 onwards.
This is the observable behavior, we can find as many justifications for
it as we wish.
Well, you can also argue that the DT was broken at N and N+1 for not
describing the HW correctly and completely. No binding has changed
here. Your DT was incomplete, and someone fixed it for you.
We can argue this things forever and a half. I've laid down the ground
rules for the stuff I maintain. If you're not happy with this, you can
fix it by either removing the NXP hardware from the tree, or taking
over from me as the irqchip maintainer. I'd be perfectly happy with
any (and even more, with both) of these outcomes.
Ok, my intention wasn't to inflame you even though the way in which I
presented the problem might have suggested otherwise.
With my developer hat I still don't agree with you even with the
additional clarification you've made that you were referring only to
bindings and not to any and all DT changes. The reason being that the DT
blob is a whole, and it doesn't matter if there's a regression because
of a binding change or something else, you still need to be prepared to
update it, sometimes in lockstep with the kernel, like it or not.
But as a user, I just wanted to get an opinion from you what can we do
to deal better with this situation: optional interrupt provided by
device with missing driver, which of_irq_get() doesn't seem to understand.
FWIW, of_irq_get() absolutely understands how to handle a missing IRQ
provider driver; it returns -EPROBE_DEFER. If a caller considers the IRQ
optional, then it's up to that caller to decide how long to keep waiting
for the provider to appear until giving up and carrying on without it.
If your phy driver is making the dumb decision to wait for ever for
something which isn't critical, then you're free to fix it, or perhaps
even propose for of_irq_get() to opt in to the
driver_deferred_probe_check_state() mechanism if you believe it's a
sufficiently general case.
If a new DT with an additional new property (either on an existing
machine, or on a completely new machine which has the property from the
start) exposes a bug in a driver, that's unfortunate, but it is entirely
irrelevant to the ABI implications of changing the interpretation of an
existing property.
Robin.