On Wed, 2015-04-29 at 11:53 +0100, Sudeep Holla wrote: > On 28/04/15 14:54, Jon Medhurst (Tixy) wrote: > > On Mon, 2015-04-27 at 12:40 +0100, Sudeep Holla wrote: [...] > >> + int ret; > >> + u8 token, chan; > >> + struct scpi_xfer *msg; > >> + struct scpi_chan *scpi_chan; > >> + > >> + chan = atomic_inc_return(&scpi_info->next_chan) % scpi_info->num_chans; > >> + scpi_chan = scpi_info->channels + chan; > >> + > >> + msg = get_scpi_xfer(scpi_chan); > >> + if (!msg) > >> + return -ENOMEM; > >> + > >> + token = atomic_inc_return(&scpi_chan->token) & CMD_TOKEN_ID_MASK; > > > > So, this 8 bit token is what's used to 'uniquely' identify a pending > > command. But as it's just an incrementing value, then if one command > > gets delayed for long enough that 256 more are issued then we will have > > a non-unique value and scpi_process_cmd can go wrong. > > > > IMO by the time 256 message are queued up and serviced we would timeout > on the initial command. Moreover the core mailbox has sent the mailbox > length to 20(MBOX_TX_QUEUE_LEN) which needs to removed to even get the > remote chance of hit the corner case. The corner case can be hit even if the queue length is only 2, because other processes/cpus can use the other message we don't own here and they can send then receive a message using that, 256 times. The corner case doesn't require 256 simultaneous outstanding requests. That is the reason I suggested that rather than using an incrementing value for the 'unique' token, that each message instead contain the value of the token to use with it. > > > Note, this delay doesn't just have to be at the SCPI end. We could get > > preempted here (?) before actually sending the command to the SCP and > > other kernel threads or processes could send those other 256 commands > > before we get to run again. > > > > Agreed, but we would still timeout after 3 jiffies max. But we haven't started any timeout yet, the 3 jiffies won't start until we get scheduled again and call wait_for_completion_timeout below. > > > Wouldn't it be better instead to have scpi_alloc_xfer_list add a unique > > number to each struct scpi_xfer. > > > > One of reason using it part of command is that SCP gives it back in the > response to compare. Can't we fill the token in the command from the value stored in the struct scpi_xfer we are using to send that command? > >> + > >> + msg->slot = BIT(SCPI_SLOT); > >> + msg->cmd = PACK_SCPI_CMD(cmd, token, len); > >> + msg->tx_buf = tx_buf; > >> + msg->tx_len = len; > >> + msg->rx_buf = rx_buf; > >> + init_completion(&msg->done); > >> + > >> + ret = mbox_send_message(scpi_chan->chan, msg); > >> + if (ret < 0 || !rx_buf) > >> + goto out; > >> + > >> + if (!wait_for_completion_timeout(&msg->done, MAX_RX_TIMEOUT)) > >> + ret = -ETIMEDOUT; > >> + else > >> + /* first status word */ > >> + ret = le32_to_cpu(msg->status); > >> +out: > >> + if (ret < 0 && rx_buf) /* remove entry from the list if timed-out */ > > > > So, even with my suggestion that the unique message identifies are > > fixed values stored in struct scpi_xfer, we can still have the situation > > where we timeout a request, that scpi_xfer then getting used for another > > request, and finally the SCP completes the request that we timed out, > > which has the same 'unique' value as the later one. > > > > As explained above I can't imagine hitting this condition. I will think > more on that again. I can imagine :-) If we timeout and discard messages, and reuse it's unique id, there is always the possibility of this confusion occurring. No amount of coding in the kernel can get around that. The only thing you can do to get out of this quandary is make assumptions about how the SCP firmware behaves. > > > One way to handle that it to not have any timeout on requests and assume > > the firmware isn't buggy. > > > > That's something I can't do ;) based on my experience so far. It's good > to assume firmware *can be buggy* and handle all possible errors. I'm inclined to agree. > Think > about the development firmware using this driver. This has been very > useful when I was testing the development versions. Even under stress > conditions I still see timeouts(very rarely though), so my personal > preference is to have them. But the SCPI protocol unfortunately doesn't seem to allow us to robustly handle timeouts. Well, we could keep a list of tokens used in timed out messages, and not reuse them. But if, as you say, timeouts do occur, then with only 256 available, we are likely to run out. When I brought this up 9 months ago, it was pointed out that the limitation of an 8-bit token for a message because was because the protocol designers had were cramming it into the 32-bit value poked into the MHU register. The new finished protocol spec doesn't use the MHU register any more for this data, but the limitations we're kept by specifying the same command data format but just stored in the shared memory. Pity the opportunity wasn't taken to expand the token size to something that allowed more robust use. -- Tixy -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html