Re: [PATCH v5 09/13] mailbox: Add Gunyah message queue mailbox

Elliot Berman <quic_eberman@xxxxxxxxxxx> · Thu, 13 Oct 2022 15:32:48 -0700

On 10/12/2022 2:47 PM, Dmitry Baryshkov wrote:
On 11/10/2022 03:08, Elliot Berman wrote:
+
+static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
+{
+    struct gunyah_msgq *msgq = data;
+
+    mbox_chan_txdone(gunyah_msgq_chan(msgq), 0);
+
+    return IRQ_HANDLED;
+}
+
+static void gh_msgq_txdone_tasklet(unsigned long data)
+{
+    struct gunyah_msgq *msgq = (struct gunyah_msgq *)data;
+
+    mbox_chan_txdone(gunyah_msgq_chan(msgq), msgq->last_status);

I don't quite get this. Why do you need both an IRQ and a tasklet?

I've now tweaked the code comments now as well to explain a bit better.

Gunyah tells us in the hypercall itself whether the message queue is 
full. Once the the message queue is full, Gunyah will let us know when 
reader starts draining the queue and we can start adding more messages 
via the tx_irq.

One point to note: the last message to be sent into the message queue 
that makes the queue full can be detected. The hypercall reports that 
the message was sent (GH_ERROR_OK) and the "ready" return value is 
false. In its current form, the msgq mailbox driver should never make a 
send hypercall and get GH_ERROR_MSGQUEUE_FULL because the driver 
properly track when the message queue is full.

When mailbox driver reports txdone, the implication is that more 
messages can be sent (not just that the message was transmitted). In 
typical operation, the msgq mailbox driver can immediately report that 
the message was sent and no tx_irq happens because the hypercall returns 
GH_ERROR_OK and ready=true. The mailbox framework doesn't allow txdone 
directly from the send_data callback. To work around that, Jassi 
recommended we use tasklet [1]. In the "atypical" case where message 
queue becomes full, we get GH_ERROR_OK and ready=false. In that case, we 
don't report txdone right away with the tasklet and instead wait for the 
tx_irq to know when more messages can be sent.

[1]: Tasklet works because send_data is called from mailbox framework 
with interrupts disabled. Once interrupts are re-enabled, the txdone is 
allowed to happen which is also when tasklet runs.

+
+    /**
+     * EAGAIN: message didn't send.
+     * ret = 1: message sent, but now the message queue is full and 
we can't send any more msgs.
+     * Either way, don't report that this message is done.
+     */
+    if (ret == -EAGAIN || ret == 1)
+        return ret;

'1' doesn't seem to be a valid return code for _send_data.

Also it would be logical to return any error here, not just -EAGAIN.

If I return error to mailbox framework, then the message is stuck: 
clients don't know that there was some underlying transport failure. It 
would be retried if the client sends another message, but there is no 
guarantee that either retrying later would work (what would have 
changed?) nor that client would send another message to trigger retry. 
If the message is malformed or message queue not correctly set up, 
client would never know. Client should be told that the message wasn't sent.

+int gunyah_msgq_init(struct device *parent, struct gunyah_msgq *msgq, 
struct mbox_client *cl,
+             struct gunyah_resource *tx_ghrsc, struct gunyah_resource 
*rx_ghrsc)

Are the message queues allocated/created dynamically or statically? If 
the later is true, please use devm_request(_threaded)_irq and devm_kzalloc.

With the exception of resource manager, message queues are created 
dynamically.

P.S. Thanks for all the other suggestions in this and the other patches, 
I've applied them.

Thanks,
Elliot