On Tue, May 18, 2021 at 11:09:25AM +0200, Patryk Duda wrote: > Sometimes kernel is trying to probe Fingerprint MCU (FPMCU) when it > hasn't initialized SPI yet. This can happen because FPMCU is restarted > during system boot and kernel can send message in short window > eg. between sysjump to RW and SPI initialization. > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.4+ > Signed-off-by: Patryk Duda <pdk@xxxxxxxxxxxx> > --- > Fingerprint MCU is rebooted during system startup by AP firmware (coreboot). > During cold boot kernel can query FPMCU in a window just after jump to RW > section of firmware but before SPI is initialized. The window was > shortened to <1ms, but it can't be eliminated completly. > > Communication with FPMCU (and all devices based on EC) is bi-directional. > When kernel sends message, EC will send EC_SPI* status codes. When EC is > not able to process command one of bytes will be eg. EC_SPI_NOT_READY. > This mechanism won't work when SPI is not initailized on EC side. In fact, > buffer is filled with 0xFF bytes, so from kernel perspective device is not > responding. To avoid this problem, we can query device once again. We are > already waiting EC_MSG_DEADLINE_MS for response, so we can send command > immediately. > > Best regards, > Patryk > drivers/platform/chrome/cros_ec_proto.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c > index aa7f7aa77297..3384631d21e2 100644 > --- a/drivers/platform/chrome/cros_ec_proto.c > +++ b/drivers/platform/chrome/cros_ec_proto.c > @@ -279,6 +279,18 @@ static int cros_ec_host_command_proto_query(struct cros_ec_device *ec_dev, > msg->insize = sizeof(struct ec_response_get_protocol_info); > > ret = send_command(ec_dev, msg); > + /* > + * Send command once again when timeout occurred. > + * Fingerprint MCU (FPMCU) is restarted during system boot which > + * introduces small window in which FPMCU won't respond for any > + * messages sent by kernel. There is no need to wait before next > + * attempt because we waited at least EC_MSG_DEADLINE_MS. > + */ > + if (ret == -ETIMEDOUT) { > + dev_warn(ec_dev->dev, > + "Timeout to get response from EC. Retrying.\n"); If a user sees this, what can they do? No need to spam the kernel logs, just retry. > + ret = send_command(ec_dev, msg); But wait, why just retry once? Why not 10 times? 100? 1000? How about a simple loop here instead, with a "sane" number of retries as a max. thanks, greg k-h