Re: [PATCH] platform/chrome: cros_ec_proto: Send command again when timeout occurs

Greg KH <greg@xxxxxxxxx> · Tue, 18 May 2021 11:29:26 +0200

On Tue, May 18, 2021 at 11:09:25AM +0200, Patryk Duda wrote:
> Sometimes kernel is trying to probe Fingerprint MCU (FPMCU) when it
> hasn't initialized SPI yet. This can happen because FPMCU is restarted
> during system boot and kernel can send message in short window
> eg. between sysjump to RW and SPI initialization.
> 
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.4+
> Signed-off-by: Patryk Duda <pdk@xxxxxxxxxxxx>
> ---
> Fingerprint MCU is rebooted during system startup by AP firmware (coreboot).
> During cold boot kernel can query FPMCU in a window just after jump to RW
> section of firmware but before SPI is initialized. The window was
> shortened to <1ms, but it can't be eliminated completly.
> 
> Communication with FPMCU (and all devices based on EC) is bi-directional.
> When kernel sends message, EC will send EC_SPI* status codes. When EC is
> not able to process command one of bytes will be eg. EC_SPI_NOT_READY.
> This mechanism won't work when SPI is not initailized on EC side. In fact,
> buffer is filled with 0xFF bytes, so from kernel perspective device is not
> responding. To avoid this problem, we can query device once again. We are
> already waiting EC_MSG_DEADLINE_MS for response, so we can send command
> immediately.
> 
> Best regards,
> Patryk
>  drivers/platform/chrome/cros_ec_proto.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
> index aa7f7aa77297..3384631d21e2 100644
> --- a/drivers/platform/chrome/cros_ec_proto.c
> +++ b/drivers/platform/chrome/cros_ec_proto.c
> @@ -279,6 +279,18 @@ static int cros_ec_host_command_proto_query(struct cros_ec_device *ec_dev,
>  	msg->insize = sizeof(struct ec_response_get_protocol_info);
>  
>  	ret = send_command(ec_dev, msg);
> +	/*
> +	 * Send command once again when timeout occurred.
> +	 * Fingerprint MCU (FPMCU) is restarted during system boot which
> +	 * introduces small window in which FPMCU won't respond for any
> +	 * messages sent by kernel. There is no need to wait before next
> +	 * attempt because we waited at least EC_MSG_DEADLINE_MS.
> +	 */
> +	if (ret == -ETIMEDOUT) {
> +		dev_warn(ec_dev->dev,
> +			 "Timeout to get response from EC. Retrying.\n");

If a user sees this, what can they do?  No need to spam the kernel logs,
just retry.

> +		ret = send_command(ec_dev, msg);

But wait, why just retry once?  Why not 10 times?  100?  1000?  How
about a simple loop here instead, with a "sane" number of retries as a
max.

thanks,

greg k-h