On Tue, May 18, 2021 at 01:22:30PM +0200, Patryk Duda wrote: > wt., 18 maj 2021 o 11:29 Greg KH <greg@xxxxxxxxx> napisał(a): > > > > On Tue, May 18, 2021 at 11:09:25AM +0200, Patryk Duda wrote: > > > Sometimes kernel is trying to probe Fingerprint MCU (FPMCU) when it > > > hasn't initialized SPI yet. This can happen because FPMCU is restarted > > > during system boot and kernel can send message in short window > > > eg. between sysjump to RW and SPI initialization. > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.4+ > > > Signed-off-by: Patryk Duda <pdk@xxxxxxxxxxxx> > > > --- > > > Fingerprint MCU is rebooted during system startup by AP firmware (coreboot). > > > During cold boot kernel can query FPMCU in a window just after jump to RW > > > section of firmware but before SPI is initialized. The window was > > > shortened to <1ms, but it can't be eliminated completly. > > > > > > Communication with FPMCU (and all devices based on EC) is bi-directional. > > > When kernel sends message, EC will send EC_SPI* status codes. When EC is > > > not able to process command one of bytes will be eg. EC_SPI_NOT_READY. > > > This mechanism won't work when SPI is not initailized on EC side. In fact, > > > buffer is filled with 0xFF bytes, so from kernel perspective device is not > > > responding. To avoid this problem, we can query device once again. We are > > > already waiting EC_MSG_DEADLINE_MS for response, so we can send command > > > immediately. > > > > > > Best regards, > > > Patryk > > > drivers/platform/chrome/cros_ec_proto.c | 12 ++++++++++++ > > > 1 file changed, 12 insertions(+) > > > > > > diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c > > > index aa7f7aa77297..3384631d21e2 100644 > > > --- a/drivers/platform/chrome/cros_ec_proto.c > > > +++ b/drivers/platform/chrome/cros_ec_proto.c > > > @@ -279,6 +279,18 @@ static int cros_ec_host_command_proto_query(struct cros_ec_device *ec_dev, > > > msg->insize = sizeof(struct ec_response_get_protocol_info); > > > > > > ret = send_command(ec_dev, msg); > > > + /* > > > + * Send command once again when timeout occurred. > > > + * Fingerprint MCU (FPMCU) is restarted during system boot which > > > + * introduces small window in which FPMCU won't respond for any > > > + * messages sent by kernel. There is no need to wait before next > > > + * attempt because we waited at least EC_MSG_DEADLINE_MS. > > > + */ > > > + if (ret == -ETIMEDOUT) { > > > + dev_warn(ec_dev->dev, > > > + "Timeout to get response from EC. Retrying.\n"); > > > > If a user sees this, what can they do? No need to spam the kernel logs, > > just retry. > User can do nothing about it. I will remove this in next version of patch. > > > > > + ret = send_command(ec_dev, msg); > > > > But wait, why just retry once? Why not 10 times? 100? 1000? How > > about a simple loop here instead, with a "sane" number of retries as a > > max. > EC based devices are designed to respond always or return appropriate > status code > when they can't process command. But this assumes that SPI is always > ready to work. > It's true for Embedded Controller, but not for Fingerprint MCU. So we > can retry once, > in case of sending message, when FPMCU is in narrow window (~1ms) when SPI is > not initialized. > > Every send_command() call can take about 200ms when device is not responding, > so next retry will happen after 200ms, at least. If 200ms is not > enough for FPMCU > to initialize SPI, it's definitely something wrong with FPMCU Ok, then just loop for 2 for this, should make it more obvious. thanks, greg k-h