Hello Marc, On 10/9/19 3:18 PM, Marc Kleine-Budde wrote: > Hello Jeroen, > > I'm currently looking at the rx-offload error handling in detail. > > TLDR: I've added the patch to linux-can. > > Thanks, > Marc > > For the record, the details: > > On 9/24/19 8:45 PM, Jeroen Hofstee wrote: >> While can_rx_offload_offload_one will call mailbox_read to discard >> the mailbox in case of an error, > Yes. > > can_rx_offload_offload_one() will read into the discard mailbox in case > of an error. > > Currently there are two kinds of errors: > 1) the rx-offoad skb queue (between the IRQ handler and the NAPI) > is full > 2) alloc_can_skb() fails to allocate a skb, due to OOM > >> can_rx_offload_irq_offload_timestamp bails out in the error case. > Yes, in case of an error both can_rx_offload_irq_offload_timestamp() and > can_rx_offload_irq_offload_fifo() will stop reading mailboxes, add the > already filled skbs to the queue and schedule NAPI if needed. > > Currently both can_rx_offload_irq_offload_timestamp() and > can_rx_offload_irq_offload_fifo() will return the number of queued > mailboxes. > > This means in case of queue overflow or OOM, only one mailbox is > discaded, before can_rx_offload_irq_offload_*() returning the number of > successfully queued mailboxes so far. > > Looking at the flexcan driver: > > https://elixir.bootlin.com/linux/latest/source/drivers/net/can/flexcan.c#L867 > >> while ((reg_iflag = flexcan_read_reg_iflag_rx(priv))) { >> handled = IRQ_HANDLED; >> ret = can_rx_offload_irq_offload_timestamp(&priv->offload, >> reg_iflag); >> if (!ret) >> break; >> } > [...] >> if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) { >> handled = IRQ_HANDLED; >> can_rx_offload_irq_offload_fifo(&priv->offload); >> } > This means for the timestamp mode, at least one mailbox is discarded or > if the error occurred after reading one or more mailboxes the while loop > will try again. If the error persists a second mailbox is discarded. > > For the fifo mode, only one mailbox is discarded. > > Then the flexcan's IRQ is exited. If there are messages in mailboxes are > pending another IRQ is triggered.... I doubt that this is a good idea. > > On the other hand the ti_hecc driver: > >> /* offload RX mailboxes and let NAPI deliver them */ >> while ((rx_pending = hecc_read(priv, HECC_CANRMP))) { >> can_rx_offload_irq_offload_timestamp(&priv->offload, >> rx_pending); >> hecc_write(priv, HECC_CANRMP, rx_pending); >> } > The error is ignored and the _all_ mailboxes are discarded (given the > error persists). > >> Since it is typically called from a while loop, all message will >> eventually be discarded. So lets continue on error instead to discard >> them directly. > After reading my own code and writing it up, your patch totally makes sense. > > If there is a shortage of resources, queue overrun or OOM, it makes no > sense to return from the IRQ handler, if a mailbox is still active as it > will trigger the IRQ again. Entering the IRQ handler again probably > doesn't give the system time to recover from the resource problem. Indeed, I have nothing to comment on that, besides thanks for being willing to reconsider your own code. With kind regards, Jeroen