On 10/12/2022 2:47 PM, Dmitry Baryshkov wrote:
On 11/10/2022 03:08, Elliot Berman wrote:
+
+static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
+{
+ struct gunyah_msgq *msgq = data;
+
+ mbox_chan_txdone(gunyah_msgq_chan(msgq), 0);
+
+ return IRQ_HANDLED;
+}
+
+static void gh_msgq_txdone_tasklet(unsigned long data)
+{
+ struct gunyah_msgq *msgq = (struct gunyah_msgq *)data;
+
+ mbox_chan_txdone(gunyah_msgq_chan(msgq), msgq->last_status);
I don't quite get this. Why do you need both an IRQ and a tasklet?
I've now tweaked the code comments now as well to explain a bit better.
Gunyah tells us in the hypercall itself whether the message queue is
full. Once the the message queue is full, Gunyah will let us know when
reader starts draining the queue and we can start adding more messages
via the tx_irq.
One point to note: the last message to be sent into the message queue
that makes the queue full can be detected. The hypercall reports that
the message was sent (GH_ERROR_OK) and the "ready" return value is
false. In its current form, the msgq mailbox driver should never make a
send hypercall and get GH_ERROR_MSGQUEUE_FULL because the driver
properly track when the message queue is full.
When mailbox driver reports txdone, the implication is that more
messages can be sent (not just that the message was transmitted). In
typical operation, the msgq mailbox driver can immediately report that
the message was sent and no tx_irq happens because the hypercall returns
GH_ERROR_OK and ready=true. The mailbox framework doesn't allow txdone
directly from the send_data callback. To work around that, Jassi
recommended we use tasklet [1]. In the "atypical" case where message
queue becomes full, we get GH_ERROR_OK and ready=false. In that case, we
don't report txdone right away with the tasklet and instead wait for the
tx_irq to know when more messages can be sent.
[1]: Tasklet works because send_data is called from mailbox framework
with interrupts disabled. Once interrupts are re-enabled, the txdone is
allowed to happen which is also when tasklet runs.
+
+ /**
+ * EAGAIN: message didn't send.
+ * ret = 1: message sent, but now the message queue is full and
we can't send any more msgs.
+ * Either way, don't report that this message is done.
+ */
+ if (ret == -EAGAIN || ret == 1)
+ return ret;
'1' doesn't seem to be a valid return code for _send_data.
Also it would be logical to return any error here, not just -EAGAIN.
If I return error to mailbox framework, then the message is stuck:
clients don't know that there was some underlying transport failure. It
would be retried if the client sends another message, but there is no
guarantee that either retrying later would work (what would have
changed?) nor that client would send another message to trigger retry.
If the message is malformed or message queue not correctly set up,
client would never know. Client should be told that the message wasn't sent.
+int gunyah_msgq_init(struct device *parent, struct gunyah_msgq *msgq,
struct mbox_client *cl,
+ struct gunyah_resource *tx_ghrsc, struct gunyah_resource
*rx_ghrsc)
Are the message queues allocated/created dynamically or statically? If
the later is true, please use devm_request(_threaded)_irq and devm_kzalloc.
With the exception of resource manager, message queues are created
dynamically.
P.S. Thanks for all the other suggestions in this and the other patches,
I've applied them.
Thanks,
Elliot