Yu-Hsuan Hsu <yuhsuan@xxxxxxxxxxxx> 於 2020年7月4日 週六 下午7:34寫道: > > Guenter Roeck <groeck@xxxxxxxxxx> 於 2020年7月4日 週六 上午3:28寫道: > > > > On Fri, Jul 3, 2020 at 12:11 PM Yu-Hsuan Hsu <yuhsuan@xxxxxxxxxxxx> wrote: > > > > > > Guenter Roeck <groeck@xxxxxxxxxx> 於 2020年7月3日 週五 下午11:58寫道: > > > > > > > > On Fri, Jul 3, 2020 at 3:56 AM Enric Balletbo i Serra > > > > <enric.balletbo@xxxxxxxxxxxxx> wrote: > > > > > > > > > > Hi Yu-Hsuan, > > > > > > > > > > On 3/7/20 11:40, Yu-Hsuan Hsu wrote: > > > > > > Enric Balletbo i Serra <enric.balletbo@xxxxxxxxxxxxx> 於 2020年7月3日 週五 下午5:19寫道: > > > > > >> > > > > > >> Hi Yu-Hsuan, > > > > > >> > > > > > >> On 3/7/20 10:48, Yu-Hsuan Hsu wrote: > > > > > >>> Enric Balletbo i Serra <enric.balletbo@xxxxxxxxxxxxx> 於 2020年7月3日 週五 下午4:38寫道: > > > > > >>>> > > > > > >>>> Hi Yu-Hsuan, > > > > > >>>> > > > > > >>>> Thank you for your patch > > > > > >>>> > > > > > >>>> On 3/7/20 9:19, Yu-Hsuan Hsu wrote: > > > > > >>>>> Log results of failed EC commands to identify a problem more easily. > > > > > >>>>> > > > > > >>>>> Replace cros_ec_cmd_xfer_status with cros_ec_cmd_xfer because the result > > > > > >>>>> has already been checked in this function. The wrapper is not needed. > > > > > >>>>> > > > > > >>>> > > > > > >>>> Nack, we did an effort to remove all public users of cros_ec_cmd_xfer() in > > > > > >>>> favour of cros_ec_cmd_xfer_status() and you are reintroducing again. You can do > > > > > >>>> the same but using cros_ec_cmd_xfer_status(). In fact, your patch will not build > > > > > >>>> on top of the upcoming changes. > > > > > >>> Thanks! But I have a question about implementing it. Does it look like > > > > > >>> the one below? > > > > > >>> ret = cros_ec_cmd_xfer_status(ec_dev, msg); > > > > > >>> if (ret < 0) { > > > > > >> > > > > > >> In this case will already print an error. > > > > > >> > > > > > >> What are you trying to achieve? > > > > > >> > > > > > >> If the only reason is of this patch is print a message you should either, or > > > > > >> enable dynamic printk and enable dev_dbg or event better use the kernel trace > > > > > >> functionality. There is no need to be more verbose. > > > > > >> > > > > > >> Example: > > > > > >> $ echo 1 > /sys/kernel/debug/tracing/events/cros_ec/enable > > > > > >> $ cat /sys/kernel/debug/tracing/trace > > > > > >> > > > > > >> 369.416372: cros_ec_request_start: version: 0, command: EC_CMD_USB_PD_POWER_INFO > > > > > >> 369.420528: cros_ec_request_done: version: 0, command: > > > > > >> EC_CMD_USB_PD_POWER_INFO, ec result: EC_RES_SUCCESS, retval: 16 > > > > > >> > > > > > >> Cheers, > > > > > >> Enric > > > > > >> > > > > > > Thank Enric, > > > > > > > > > > > > The situation is that some users encountered errors on ChromeBook. > > > > > > > > > > And, aren't you able to reproduce the issue? > > > > > > > > > > > > > > > > From their feedback reports, we only get the message like > > > > > > 'cros-ec-codec GOOG0013:00: ASoC: Failed to set DAI format: -71'. > > > > > > We know that -71 is -EPROTO but it is not clear enough for us to find > > > > > > out the root cause. That's why we want the detail of the result. > > > > > > > > > > > > > > > If I am not mistaken this ends calling i2s_rx_set_daifmt() into the EC firmware, > > > > > if the result is -EPROTO that means is not returning EC_RES_SUCCESS, so there > > > > > are few options: > > > > > > > > > > if (i2s_rx_enabled) > > > > > return EC_RES_BUSY; > > > > > > > > > > if (daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT) > > > > > return EC_RES_INVALID_PARAM; > > > > > > > > > > if (audio_codec_i2s_rx_set_daifmt(daifmt) != EC_SUCCESS) > > > > > return EC_RES_ERROR; > > > > > > > > > > > Because the situation happens on users' side, it is not possible for > > > > > > them to enable kernel trace (ChromeOS does not allow users to touch > > > > > > kernel). > > > > > > > > > > > > > > > > Are you sure that when you know the error code you'll find the root cause > > > > > (without adding more prints)? There is only three possibilities? You can't start > > > > > adding prints just to debug a user issue because you don't allow to be more > > > > > verbose. I understand that might help you but is not the way to go. > > > > > > Hi Enric and Guenter, > > > > > > Thanks for your inspiring comments. > > > I'm not sure whether we will find the root cause if I know the error > > > code. But I think it's not a point. > > > We wanted to add this error log because we found that the current one > > > is not enough. Since it is a real error, it would be better if we can > > > make it more detailed, right? > > > In addition, we thought it would be helpful in the future as well. > > > That's why we chose to upstream instead of merging into our source > > > tree only. > > > > > > > Kernel log messages are almost never useful for actual users. In most > > cases, they just clog up the kernel log, making it useless for when it > > actually counts (such as when the system crashes). On top of that, > > there already is an error message, only the error code doesn't mean > > much because it is "universalized". I don't see how adding another > > error message would improve that, even more so if that error message > > is only added in one place. On the other side, converting EC error > > codes to Linux kernel error codes would help every caller of > > cros_ec_cmd_xfer_status without adding more logging noise. > > Thanks for the explanation. If I understand correctly, the mean point > is this log is noisy. > In this case, the log will only appear one time during the card > probing(if it fails). But I do agree that it may create some potential > noises. > Converting EC error codes to Linux kernel error codes also work for > me. Is it possible for you to send the corresponding patch? If not, I > can try to work on it. > In addition, is it possible that one error code can come from linux or from EC? > > > > > > > > > > > > > You should really reproduce the issue yourself an use actual debug > > > > > tools/prints./traces > > > We are trying but still unable to reproduce this issue. > > > However, as I maintained above, it is not a main concern of this change. > > > > > > > > > > > > > Another possibility would be to change cros_ec_cmd_xfer_status() to > > > > return a more granular error code, such as -EINVAL for > > > > EC_RES_INVALID_PARAM, -EBUSY for EC_RES_BUSY, -EINPROGRESS for > > > > EC_RES_IN_PROGRESS, -ETIMEDOUT for EC_RES_TIMEOUT, -EOVERFLOW for > > > > EC_RES_OVERFLOW, -ENODATA for EC_RES_UNAVAILABLE, and so on. > > > Since there are many kinds of results from EC, why not just make users > > > able to check on their own? > > > For example, users can wait and try again if the result is EC_RES_BUSY. > > > > > > > That is exactly what -EBUSY is for: It lets the user space application > > decide what to do about it. A kernel log message can not and will > > never achieve that. > > > > > > > > > > However, it appears that the various low level functions already > > > > replace various EC error codes with a blank EC_RES_ERROR. No amount of > > > > logging will tell us what exactly went wrong in those functions. Lucky > > > > for us, audio_codec_i2s_rx_set_daifmt() only ever returns EC_SUCCESS, > > > > so we know that the problem is either that i2s_rx_enabled is true or > > > > that daifmt is too large. None of those really warrants more verbose > > > > logging. > > > > > > > > From the context, my personal bet is that i2s_rx_enabled is true: I > > > > don't immediately see how disabling it is enforced before trying to > > > > set the DAI format, and I don't see how "daifmt >= > > > > EC_CODEC_I2S_RX_DAIFMT_COUNT" can ever be true. > > > I totally agree. According to the source, it seems that both path are > > > > You are effectively saying that there is code to ensure that > > i2s_rx_enabled is false. Granted, the code is too complex to easily > > understand, and I may not have seen that flow. Per the same logic, > > though, it might well be possible that _because_ the code is not easy > > to understand there may well be a data path where i2s_rx_enabled is > > set. One would need to ensure that the sequence of <disable rx> - <set > > dai format> - <re-enable rx> is guaranteed, and that it is always > > executed under a kernel lock. Maybe you can point me to that code, for > > my education. > > > > On the other side, it is much easier to verify that "daifmt >= > > EC_CODEC_I2S_RX_DAIFMT_COUNT" is never true. > > > > Thanks, > > Guenter > The complete errors are: > ERR kernel: [ 8.568171] cros-ec-codec GOOG0013:00: ASoC: error at > snd_soc_dai_set_fmt on GOOG0013:00: -71 > WARNING kernel: [ 8.583312] cros-ec-codec GOOG0013:00: ASoC: Failed > to set DAI format: -71 > ERR kernel: [ 8.629388] acp3x-alc5682-max98357 AMDI5682:00: > devm_snd_soc_register_card(acp3xalc5682m98357) failed: -71 > WARNING kernel: [ 8.645306] acp3x-alc5682-max98357: probe of > AMDI5682:00 failed with error -71 > > This issue happened when probing the card while the i2s_rx is only > enabled when the device is used. Therefore, the i2s_rx should not be > enabled before the card is probed. However, I may miss something. I > will check the entire path later. I finally found a possible root cause. EC does not restart when we reboots kernel. It means if kernel crashes while i2s_rx is enabled, we will trigger this issue. In addition, the issue will stay until the next time rebooting EC. I will work on the fix. Thank you for very helpful information. > For "daifmt >= EC_CODEC_I2S_RX_DAIFMT_COUNT", it is obviously not to > be true by looking into i2s_rx_set_fmt function. > > Thanks, > Yu-Hsuan > > > > > > impossible. I'm not really understand the whole path but is it > > > possible for EC to return other results? I will do more tests and look > > > carefully into the source. Really thanks for your suggestions. > > > > > > Cheers, > > > Yu-Hsuan > > > > > > > > Guenter > > > > > > > > > > > > Guenter > > > > > > > > > Cheers, > > > > > Enric > > > > > > > > > > > The other way we thought is changing dev_dbg to dev_err in > > > > > > cros_ec_cmd_xfer_status. But we are not sure whether it is also an > > > > > > error for other usages. > > > > > > > > > > > >>> if (ret == -EPROTO) > > > > > >>> dev_err(..., msg->result) > > > > > >>> goto error; > > > > > >>> } > > > > > >>> I'm not sure whether it makes sense to check ret == -EPROTO here. > > > > > >>> > > > > > >>>> > > > > > >>>>> Signed-off-by: Yu-Hsuan Hsu <yuhsuan@xxxxxxxxxxxx> > > > > > >>>>> --- > > > > > >>>>> sound/soc/codecs/cros_ec_codec.c | 9 ++++++++- > > > > > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > >>>>> > > > > > >>>>> diff --git a/sound/soc/codecs/cros_ec_codec.c b/sound/soc/codecs/cros_ec_codec.c > > > > > >>>>> index 8d45c628e988e..a4ab62f59efa6 100644 > > > > > >>>>> --- a/sound/soc/codecs/cros_ec_codec.c > > > > > >>>>> +++ b/sound/soc/codecs/cros_ec_codec.c > > > > > >>>>> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device *ec_dev, uint32_t cmd, > > > > > >>>>> if (outsize) > > > > > >>>>> memcpy(msg->data, out, outsize); > > > > > >>>>> > > > > > >>>>> - ret = cros_ec_cmd_xfer_status(ec_dev, msg); > > > > > >>>>> + ret = cros_ec_cmd_xfer(ec_dev, msg); > > > > > >>>>> if (ret < 0) > > > > > >>>>> goto error; > > > > > >>>>> > > > > > >>>>> + if (msg->result != EC_RES_SUCCESS) { > > > > > >>>>> + dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd, > > > > > >>>>> + msg->result); > > > > > >>>>> + ret = -EPROTO; > > > > > >>>>> + goto error; > > > > > >>>>> + } > > > > > >>>>> + > > > > > >>>>> if (insize) > > > > > >>>>> memcpy(in, msg->data, insize); > > > > > >>>>> > > > > > >>>>>