Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

Shawn N <shawnn@xxxxxxxxxxxx> · Mon, 25 Sep 2017 16:15:57 -0700

On Wed, Sep 20, 2017 at 1:22 PM, Shawn N <shawnn@xxxxxxxxxx> wrote:
> On Tue, Sep 19, 2017 at 11:13 PM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On Tue, Sep 19, 2017 at 11:05:38PM -0700, Shawn N wrote:
>>> This is failing because our EC_CMD_GET_PROTOCOL_INFO host command is
>>> getting messed up, or the reply buffer is getting corrupted somehow.
>>>
>>>                ec_dev->proto_version =
>>>                         min(EC_HOST_REQUEST_VERSION,
>>>                                         fls(proto_info->protocol_versions) - 1);
>>>
>
> Checking this closer, the first host command we send after we boot the
> kernel (EC_CMD_GET_PROTOCOL_INFO) is failing due to protocol error
> (see 'SPI rx bad data' / 'SPI not ready' on the EC console). Since
> this doesn't seem to happen on the Chromium OS nyan_big release
> kernel, I suggest to hook up a logic analyzer and see if the SPI
> master is doing something bad.
>
> The error handling in cros_ec_cmd_xfer_spi() is completely wrong and
> we return -EAGAIN / EC_RES_IN_PROGRESS, which the caller interprets
> "the host command was received by the EC and is currently being
> handled, poll status until completion". So the caller polls status
> with EC_CMD_GET_COMMS_STATUS, sees no host command is in progress
> (which is interpreted to mean "the host command I sent previously has
> now successfully completed"), and returns success. The problem here is
> that the initial host command was never received at all, and no reply
> was ever received, so our reply data is all zero.
>
> Two things need to be fixed here:
>
> 1) Find out why the first host command after boot is failing. Probe
> SPI pins and see what's going on.
> 2) Fix error handling so we properly return an error (or properly
> retry the entire command) when a protocol error occurs (I made some
> attempt in https://chromium-review.googlesource.com/385080/, probably
> I should revisit that).

The below patch will fix error handling and will make things mostly
work on nyan_big, because we'll fall back to V2 protocol after the
initial failure. But we should still investigate why we're getting
errors on the first host command. We aren't seeing these errors when
we send commands from firmware, so I suspect something is wrong in
kernel SPI HW initialization that causes the first command to fail.

From: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
Date: Mon, 25 Sep 2017 14:32:38 -0700
Subject: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

For host commands that take a long time to process, cros ec can return
early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
status with EC_CMD_GET_COMMS_STATUS until completion of the command.

None of the above applies when data link errors are encountered. When
errors such as EC_SPI_PAST_END are encountered during command
transmission, it usually means the command was not received by the EC.
Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
almost always the wrong decision, and can result in host commands
silently being lost.

Signed-off-by: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
---
 drivers/mfd/cros_ec_spi.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index c9714072e224..d33e3847e11e 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct
cros_ec_device *ec_dev,
        u8 *ptr;
        u8 *rx_buf;
        u8 sum;
+       u8 rx_byte;
        int ret = 0, final_ret;

        len = cros_ec_prepare_tx(ec_dev, ec_msg);
@@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct
cros_ec_device *ec_dev,
        if (!ret) {
                /* Verify that EC can process command */
                for (i = 0; i < len; i++) {
-                       switch (rx_buf[i]) {
-                       case EC_SPI_PAST_END:
-                       case EC_SPI_RX_BAD_DATA:
-                       case EC_SPI_NOT_READY:
-                               ret = -EAGAIN;
-                               ec_msg->result = EC_RES_IN_PROGRESS;
-                       default:
+                       rx_byte = rx_buf[i];
+                       if (rx_byte == EC_SPI_PAST_END  ||
+                           rx_byte == EC_SPI_RX_BAD_DATA ||
+                           rx_byte == EC_SPI_NOT_READY) {
+                               ret = -EREMOTEIO;
                                break;
                        }
-                       if (ret)
-                               break;
                }
-               if (!ret)
-                       ret = cros_ec_spi_receive_packet(ec_dev,
-                                       ec_msg->insize + sizeof(*response));
-       } else {
-               dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
        }

+       if (!ret)
+               ret = cros_ec_spi_receive_packet(ec_dev,
+                               ec_msg->insize + sizeof(*response));
+       else
+               dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
        final_ret = terminate_request(ec_dev);

        spi_bus_unlock(ec_spi->spi->master);
-- 
2.12.2

>
>>> If proto_info->protocol_versions == 0 then ec_dev->proto_version will
>>> be assigned 0xffff. The logic here seems strange to me, if the EC is
>>
>> Whoops...
>>
>>> successfully replying to our v3 command then obviously it supports v3
>>> (maybe it will be useful someday if EC_HOST_REQUEST_VERSION is rev'd).
>>> Anyway, we need to figure out what is happening with our
>>> EC_HOST_REQUEST_VERSION host command.
>>>
>>> On Tue, Sep 19, 2017 at 10:14 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
>>> > Hi Jon,
>>> >
>>> > On Tue, Sep 19, 2017 at 05:39:56PM +0100, Jon Hunter wrote:
>>> >> On 19/09/17 15:09, Shawn N wrote:
>> ...
>>> > Furthermore, the only assignments to this 'proto_version' field look
>>> > like they're only writing one of 0, 2, 3, or
>>> >
>>> >    min(EC_HOST_REQUEST_VERSION, fls(proto_info->protocol_versions) - 1)
>>> >
>>> > . I don't see where 0xffff comes from.
>>
>> ...I'm an idiot. While the rvalue (the expression above) is an int (e.g,
>> -1), it's getting cast into a uint16_t (ec_dev->proto_version). So
>> that's where the 0xffff can come from.
>
> I saw that before and overlooked it too, so we're both idiots.
>
>>
>> Sorry if I misled you Shawn :(
>>
>> Brian
From 9a9073b069001f0d265b735e263f0f9823965e95 Mon Sep 17 00:00:00 2001
From: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
Date: Mon, 25 Sep 2017 14:32:38 -0700
Subject: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling

For host commands that take a long time to process, cros ec can return
early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
status with EC_CMD_GET_COMMS_STATUS until completion of the command.

None of the above applies when data link errors are encountered. When
errors such as EC_SPI_PAST_END are encountered during command
transmission, it usually means the command was not received by the EC.
Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
almost always the wrong decision, and can result in host commands
silently being lost.

Signed-off-by: Shawn Nematbakhsh <shawnn@xxxxxxxxxxxx>
---
 drivers/mfd/cros_ec_spi.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index c9714072e224..d33e3847e11e 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
 	u8 *ptr;
 	u8 *rx_buf;
 	u8 sum;
+	u8 rx_byte;
 	int ret = 0, final_ret;
 
 	len = cros_ec_prepare_tx(ec_dev, ec_msg);
@@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
 	if (!ret) {
 		/* Verify that EC can process command */
 		for (i = 0; i < len; i++) {
-			switch (rx_buf[i]) {
-			case EC_SPI_PAST_END:
-			case EC_SPI_RX_BAD_DATA:
-			case EC_SPI_NOT_READY:
-				ret = -EAGAIN;
-				ec_msg->result = EC_RES_IN_PROGRESS;
-			default:
+			rx_byte = rx_buf[i];
+			if (rx_byte == EC_SPI_PAST_END  ||
+			    rx_byte == EC_SPI_RX_BAD_DATA ||
+			    rx_byte == EC_SPI_NOT_READY) {
+				ret = -EREMOTEIO;
 				break;
 			}
-			if (ret)
-				break;
 		}
-		if (!ret)
-			ret = cros_ec_spi_receive_packet(ec_dev,
-					ec_msg->insize + sizeof(*response));
-	} else {
-		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
 	}
 
+	if (!ret)
+		ret = cros_ec_spi_receive_packet(ec_dev,
+				ec_msg->insize + sizeof(*response));
+	else
+		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
 	final_ret = terminate_request(ec_dev);
 
 	spi_bus_unlock(ec_spi->spi->master);
-- 
2.12.2