Re: [PATCH 2/2] ASoC: DA7219: Implement error check on reg read and write

Mark Brown <broonie@xxxxxxxxxx> · Thu, 6 Dec 2018 19:56:19 +0000

On Wed, Dec 05, 2018 at 10:50:29AM -0700, Daniel Kurtz wrote:

> I agree, there is no guarantee here once things have gone wrong, and
> the concerns above are reasonable.  However, in the real world, I2C
> transactions do sometimes fail for various reasons.  The I2C (and

It's *vanishingly* rare and like I say we don't have any constructive
recovery plans - I've never seen a practical system that had any better
idea than hoping the user noticed a problem and reboots.  Probably the
best we can really do is try to reset the device, or at least resync the
register map.  That would probably do something useful in the
vanishingly rare case where it were a singular glitch, at the cost of
obvious and painful user visible issues (which would probably be
happening anyway).  The nearest I've seen to that is one of the CODEC
drivers which has watchdog code to monitor the device in case it
spontaneously reboots and will resync the register cache if that happened.

I'm gathering that there are Chromebooks that do have I/O problems here?

> other bus) APIs have ways of reporting up their errors so callers can
> take appropriate action.  This codec driver can run on all kinds of
> hardware that can experience transient I2C errors, thus it sounds like
> a reasonable idea to have the driver do some error checking on the
> APIs it calls and take whatever action it can.  Just ignoring the
> errors and proceeding like nothing is wrong is one option, but we can
> probably do a little better by at least checking for errors, abort the
> current operation, and pass up errors to higher layers when an i2c
> transaction failure is detected.  If nothing else, this would enable
> higher policy layers to take appropriate corrective action - for
> example, if there is an i2c error when configuring a codec, it seems
> advisable to report this up so that a machine driver would know to
> abort and not turn on downstream amplifiers [I am assuming here that
> something like this would happen, I don't know if the sound stack
> really works this way].

Yeah, so this is partly why I'm not super thrilled here - none of the
layers currently have any especially bright ideas about how to handle
things and giving up part way through an operation and returning an
error without any attempt to unwind or recover feels like it's just
trying to give a false sense of security.  

> Once the default "check, abort and report" error checking is in place,
> we could perhaps think about actions that the driver could take to
> recover from various failures, such as resetting the device or
> unwinding previous transactions before aborting, or retrying....  but
> those are all policy, and this patch is more mechanism that enables
> policy.

But is this going to be a useful way of handling such policy and is any
work on that from whoever has these unreliable systems likely to be
forthcoming?  We're going to end up with partially done reconfigurations
in the register cache which is a potential issue for recovery strategies
based around resyncing that, it's not so bad on things like starting a
stream but bias level reconfigurations could be fun.
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel