Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@xxxxxxxxxxxxxxx> On 2/26/2025 5:18 PM, Lizhi Hou wrote: > There is a timeout failure been found during stress tests. If the firmware > generates a mailbox response right after driver clears the mailbox channel > interrupt register, the hardware will not generate an interrupt for the > response. This causes the unexpected mailbox command timeout. > > To handle this failure, driver checks the interrupt register before > exiting mailbox_rx_worker(). If there is a new response, driver goes back > to process it. > > Signed-off-by: Lizhi Hou <lizhi.hou@xxxxxxx> > --- > drivers/accel/amdxdna/amdxdna_mailbox.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.c b/drivers/accel/amdxdna/amdxdna_mailbox.c > index de7bf0fb4594..8651b1d3c3ab 100644 > --- a/drivers/accel/amdxdna/amdxdna_mailbox.c > +++ b/drivers/accel/amdxdna/amdxdna_mailbox.c > @@ -348,8 +348,6 @@ static irqreturn_t mailbox_irq_handler(int irq, void *p) > trace_mbox_irq_handle(MAILBOX_NAME, irq); > /* Schedule a rx_work to call the callback functions */ > queue_work(mb_chann->work_q, &mb_chann->rx_work); > - /* Clear IOHUB register */ > - mailbox_reg_write(mb_chann, mb_chann->iohub_int_addr, 0); > > return IRQ_HANDLED; > } > @@ -366,6 +364,9 @@ static void mailbox_rx_worker(struct work_struct *rx_work) > return; > } > > +again: > + mailbox_reg_write(mb_chann, mb_chann->iohub_int_addr, 0); > + > while (1) { > /* > * If return is 0, keep consuming next message, until there is > @@ -379,10 +380,18 @@ static void mailbox_rx_worker(struct work_struct *rx_work) > if (unlikely(ret)) { > MB_ERR(mb_chann, "Unexpected ret %d, disable irq", ret); > WRITE_ONCE(mb_chann->bad_state, true); > - disable_irq(mb_chann->msix_irq); > - break; > + return; > } > } > + > + /* > + * The hardware will not generate interrupt if firmware creates a new > + * response right after driver clears interrupt register. Check > + * the interrupt register to make sure there is not any new response > + * before exiting. > + */ > + if (mailbox_reg_read(mb_chann, mb_chann->iohub_int_addr)) > + goto again; > } > > int xdna_mailbox_send_msg(struct mailbox_channel *mb_chann,