Re: [PATCH] Input: snvs_pwrkey - Add clk handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Helli Dmitry,

On Tue, Oct 12, 2021 at 08:06:34PM -0700, Dmitry Torokhov wrote:
> On Tue, Oct 12, 2021 at 09:39:59AM +0200, Uwe Kleine-König wrote:
> > On Mon, Oct 11, 2021 at 06:48:51PM -0700, Dmitry Torokhov wrote:
> > > On Tue, Oct 05, 2021 at 10:00:05PM +0200, Uwe Kleine-König wrote:
> > > > On Wed, Sep 22, 2021 at 11:43:00AM +0200, Uwe Kleine-König wrote:
> > > > > On i.MX7S and i.MX8M* (but not i.MX6*) the pwrkey device has an
> > > > > associated clock. Accessing the registers requires that this clock is
> > > > > enabled. Binding the driver on at least i.MX7S and i.MX8MP while not
> > > > > having the clock enabled results in a complete hang of the machine.
> > > > > (This usually only happens if snvs_pwrkey is built as a module and the
> > > > > rtc-snvs driver isn't already bound because at bootup the required clk
> > > > > is on and only gets disabled when the clk framework disables unused clks
> > > > > late during boot.)
> > > > > 
> > > > > This completes the fix in commit 135be16d3505 ("ARM: dts: imx7s: add
> > > > > snvs clock to pwrkey").
> > > > > 
> > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> > > > 
> > > > This patch fixes a hard machine hang that occurs on an i.MX8MP based
> > > > machine in ~10% of the boot ups. In my eyes it's suitable to be applied
> > > > before v5.14 even.
> > > > 
> > > > Any feedback on it?
> > > 
> > > Sorry for the delay. As you may know I strongly dislike dev_err_probe()
> > > as it conflates the 2 issue - error printing and noting the deferral
> > > event that should be implemented by the resource providers (and I
> > > believe Rob had WIP patches to push this reporting down too providers).
> > 
> > I didn't know your dislike (and I probably will forget it again soon,
> > given that there seems to be disagreement among maintainers :-), and
> > from your words I don't understand it. The improved idea is that
> > devm_clk_get_optional() already registers the deferral event for the
> > clk? My first intuition is that this won't work, so I'd like to see the
> > WIP series. (Added Rob to Cc.) Someone has a link?
> 
> I think this is here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/log/?h=dev_err-removal
> 
> I do not think it adds calls to device_set_deferred_probe_reason() but
> that should be trivial to add, given we have device pointer and ID of
> the resource, which should be enough to track it.

So the general idea is to push the error message into the resource
providers. I think this lowers the quality of error messages in some
cases. E.g. when clk_get() fails, the clk core can only emit something
like

	dev_err(dev, "Failed to get clock '%s' (%pe)\n", id, clk);

but the requesting driver has more information and might emit a more
helpful message. Or the requesting driver might even be able to handle
an error (even if it's not -ENODEV) and emitting an error is wrong. For
example because id is NULL but the driver knows a good name or a
detailed purpose of the clk can be mentioned to help debugging the
issue.

And IMHO even with this approach it's sensible to factor out the part:

	if (IS_ERR(x) && PTR_ERR(x) != -EPROBE_DEFER)
		dev_err(dev, ...);

	return PTR_ERR(x);

Ah, it could use dev_err_probe() which has exactly the required
semantic, so no need to reinvent the wheel.

I don't know your bubble of work, in mine there are often customers who
say something like "Look, my kernel error log contains these errors,
why is there a problem with device X?" and the explanation is just that
a driver emitted an error message in the -EPROBE_DEFER case, sometimes
even several of them, e.g. once whenever an USB thumb drive is inserted
into the machine. (ok, if this happens there is likely indeed a problem
to fix, but then a single error message would still be nicer.) So I
really like drivers that don't print an error on probe deferral. Also
having drivers write something helpful (today!) to
/sys/kernel/debug/devices_deferred is very appreciated, as it makes
debugging quite a bit easier. dev_err_probe() helps for both.

All in all pushing the error message reporting into the providers seems
to be orthogonal to the question how to best report an error on the
requester's side while the providers are still "unfixed". And not using
an approach that has upsides today and not only when a certain series
(that wasn't even posted yet) landed, seems wrong to me.

> > Also I don't share that sentiment, given that today
> > devm_clk_get_optional() and all the other resource providers don't do
> > the necessary stuff for deferral handling, I strongly prefer to use the
> > mechanism that is available today (even if it might be possible to
> > improve it) instead of open coding it. And if it's only because once the
> > improved variant is available it's easier to identify the code locations
> > that need adaption if they all use a common function instead of
> > identifying something like
> > 
> > 	clk = devm_clk_get_optional(&pdev->dev, NULL);
> > 	if (IS_ERR(clk)) {
> > 		error = PTR_ERR(clk);
> > 		if (error != -EPROBE_DEFER)
> > 			dev_err(pdev->dev, "Failed to get clk: %pe\n", clk)
> > 		else
> > 			device_set_deferred_probe_reason(dev, oh_I_need_a_struct_va_format_how_do_I_get_this?);
> 
> You do not, you happily ignore it and wait for providers to do it for
> you instead of forcing the change through all drivers.

I don't get this. What change is forced here? Today (where some
providers don't emit an error message) the driver has to emit an error
message itself. (Either using dev_err_probe() with the above mentioned
advantages, by open coding it or by using a simpler and inferior
procedure.) And once devm_clk_get_optional() is adapted to emit the
error message itself, you have to adapt all callers anyhow, and which of
the approaches a given driver selected doesn't make any relevant
difference, you just remove the error message emitting there.
But until devm_clk_get_optional() is "fixed" (in a year? Or two? Ever?)
you force an inferior behaviour for users with your refusal to allow
dev_err_probe().

> > 		return error;
> > 	}
> > 
> > instead of
> > 
> > 	clk = devm_clk_get_optional(&pdev->dev, NULL);
> > 	if (IS_ERR(clk))
> > 		return dev_err_probe(&pdev->dev, PTR_ERR(clk), "Failed to get clock\n");
> > 	
> > Even if the driver does not call device_set_deferred_probe_reason(), the
> > additional check for error != -EPROBE_DEFER is ugly, isn't it?
> 
> I'd simply do
> 
> 	clk = devm_clk_get_optional(...);
> 	error = PTR_ERR_OR_ZERO(clk);
> 	if (error) {
> 		dev_err(&pdev->dev, "...", error);
> 		return error;
> 	}

Done something similar now (not using PTR_ERR_OR_ZERO() which in my
experience is even more controversial than dev_err_probe()).

> > > Could you p lease resubmit with "normal" dev_err()/dev_warn()/etc and I
> > > will be happy to apply.
> > 
> > Is the above the variant you prefer? Maybe without the call to
> > device_set_deferred_probe_reason()? Or maybe even without the check for
> > -EPROBE_DEFER (which however might result in wrong error messages which
> > is IMHO worse than the ugliness of the additional check)?
> 
> Why are they wrong? They are not. They would literally say that some
> resource was not obtained because it is not ready.

With your approach you emit messages at the error level (i.e. where
people assume there is a real problem that needs addressing), but this
information is (in most cases) irrelevant and already wrong when someone
reads it. So the only effect is delaying the boot process a bit and
irritating people. Yes, technically it's right to emit an error on
-EPROBE_DEFER, but its usefulness is negative.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media Devel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Linux Wireless Networking]     [Linux Omap]

  Powered by Linux