Re: [PATCH] iio: dht11: set debug log level for parsing error messages

Harald Geyer <harald@xxxxxxxxx> · Tue, 26 Mar 2024 15:03:55 +0100

Hi George!

Am Montag, dem 25.03.2024 um 23:18 +0300 schrieb George Stark:
> On 3/25/24 21:48, Harald Geyer wrote:
> > Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
> > > Protocol parsing errors could happen due to several reasons like
> > > noise
> > > environment, heavy load on system etc. If to poll the sensor
> > > frequently
> > > and/or for a long period kernel log will become polluted with
> > > error
> > > messages if their log level is err (i.e. on by default).
> > 
> > Yes, these error are often recoverable. (As are many other HW
> > errors,
> > that typically are logged. Eg USB bus resets due to EMI)
> > 
> > [...]
> > 
> > The idea is, that these messages help users understand issues with
> > their HW (like too long cables, broken cables etc). But it is true,
> > that they will slowly accumulate in many real world scenarios
> > without
> > anything being truly wrong.
> 
> I agree with you that it's very convenient to just take a look to
> dmesg
> and see device connection problems at once. But unlike e.g. usb user
> has
> to actually start reading sensor to perform communication and read
> errors will be propagated to the userspace and could be noticed \
> handled.

Not really. The log lines contain additional information useful for
understanding the problem with the setup.

> Anyway I believe we should use uniform approach for read errors -
> currently in the driver there're already dbg messages:
> 
> "lost synchronisation at edge %d\n"
> "invalid checksum\n"

These errors are usually caused by EMI and there isn't much to do aside
from trying again until we find a time window with less interference.
They are not logged, because in some cases they might be very frequent
and can be handled by the user space client programatically anyway.

> I changed log level from err to dbg for the messages:
> 
> "Only %d signal edges detected\n"

This mostly indicates a problem with the setup. Long cable, dead
sensor, high (interrupt) load etc.

Its true that this can happen during normal operation. - Usually when
the system takes too long to enter the irq handler.

But the primary causes are:
1) Your wiring is broken. In this case, the message is immediately
helpful and points you in the right direction. (Only if you understand
the protocol though.)
2) Your sensor is dead or "crashed", which also warrants an error msg
IMO.

The "crashed" case is a bit special. Some chips seem to randomly stop
working after a couple of hours and the only remedy is to power cycle
them. This could be done automatically. - I have the sensor power
supply pin on a GPIO and reset it from userspace in my setup. I tried
to work on a version of the driver some years ago, that would
optionally register with a regulator and manage sensor resets from
within the kernel driver. If this was actually implemented, we could
reduce the logging to cases, where the reset didn't solve the problem.

I stopped working on this, because it would have required changes to
the regulator framework, to be actually useful, and the regulator
maintainers didn't seem to keen about them. However, if you want to
pick this up in an effort to reduce unnecessary error conditions and
messages, I certainly would be happy.

> "Don't know how to decode data: %d %d %d %d\n"

This would indicate a sensor, that uses the same protocol but an
unsupported data format. This is a permanent error and therefor should
be logged IMO.

I guess, if you have a bad readout due to EMI but the checksum
accidentally matches, then you might get this message too. But this
should be a very rare case.

> They all are from a single callback and say the same thing -
> communication problem.

Not really. See above.

> If we make all those messages as errors it'd be great to have
> mechanism
> to disable them e.g. thru module parameter or somehow without
> rebuilding
> kernel.

No. What you try to change is cosmetic at best. It certainly doesn't
justify adding any complexity.

Since Jonathan deferred to my judgment:

As you can see, I did consider the trade-off between useful diagnostics
and spamming the log carefully. So naturally I'm inclined to reject
your proposal unless it solves an actual problem.

Also people still mail me directly with bogus bug reports about the
driver when really they have some issue with their setup. I fear, if we
reduce diagnostics, it will increase that noise.

So I reject your proposed changes, if they are for the sake of
unification. I'm willing to discuss, what the most sensible trade-off
is, but it would need to actually add to the considerations I already
did.

Best regards,
Harald

>  Those errors can be bypassed by increasing read rate.
> 
> > 
> > I don't consider the dmesg buffer being rotated after a month or
> > two a
> > bug. But I suppose this is a corner case. I'll happily accept
> > whatever
> > Jonathan thinks is reasonable.
> > 
> > Best regards,
> > Harald
> > 
> > 
> > > Signed-off-by: George Stark <gnstark@xxxxxxxxxxxxxxxxx>
> > > ---
> > > I use DHT22 sensor with Raspberry Pi Zero W as a simple home
> > > meteo
> > > station.
> > > Even if to poll the sensor once per tens of seconds after month
> > > or
> > > two dmesg
> > > may become full of useless parsing error messages. Anyway those
> > > errors are caught
> > > in the user software thru return values.
> > > 
> > >   drivers/iio/humidity/dht11.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/iio/humidity/dht11.c
> > > b/drivers/iio/humidity/dht11.c
> > > index c97e25448772..e2cbc442177b 100644
> > > --- a/drivers/iio/humidity/dht11.c
> > > +++ b/drivers/iio/humidity/dht11.c
> > > @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11,
> > > int
> > > offset)
> > >                  dht11->temperature = temp_int * 1000;
> > >                  dht11->humidity = hum_int * 1000;
> > >          } else {
> > > -               dev_err(dht11->dev,
> > > +               dev_dbg(dht11->dev,
> > >                          "Don't know how to decode data: %d %d %d
> > > %d\n",
> > >                          hum_int, hum_dec, temp_int, temp_dec);
> > >                  return -EIO;
> > > @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
> > > *iio_dev,
> > >   #endif
> > > 
> > >                  if (ret == 0 && dht11->num_edges <
> > > DHT11_EDGES_PER_READ - 1) {
> > > -                       dev_err(dht11->dev, "Only %d signal edges
> > > detected\n",
> > > +                       dev_dbg(dht11->dev, "Only %d signal edges
> > > detected\n",
> > >                                  dht11->num_edges);
> > >                          ret = -ETIMEDOUT;
> > >                  }
> > > --
> > > 2.25.1
> > > 
> > 
>