Re: [PATCH net] net: macb: Fix lost RX packet wakeup race in NAPI receive

Robert Hancock <robert.hancock@xxxxxxxxxx> · Thu, 3 Mar 2022 16:01:31 +0000

On Wed, 2022-03-02 at 22:40 -0800, Jakub Kicinski wrote:
> On Mon, 28 Feb 2022 12:33:28 -0600 Robert Hancock wrote:
> > There is an oddity in the way the RSR register flags propagate to the
> > ISR register (and the actual interrupt output) on this hardware: it
> > appears that RSR register bits only result in ISR being asserted if the
> > interrupt was actually enabled at the time, so enabling interrupts with
> > RSR bits already set doesn't trigger an interrupt to be raised. There
> > was already a partial fix for this race in the macb_poll function where
> > it checked for RSR bits being set and re-triggered NAPI receive.
> > However, there was a still a race window between checking RSR and
> > actually enabling interrupts, where a lost wakeup could happen. It's
> > necessary to check again after enabling interrupts to see if RSR was set
> > just prior to the interrupt being enabled, and re-trigger receive in that
> > case.
> > 
> > This issue was noticed in a point-to-point UDP request-response protocol
> > which periodically saw timeouts or abnormally high response times due to
> > received packets not being processed in a timely fashion. In many
> > applications, more packets arriving, including TCP retransmissions, would
> > cause the original packet to be processed, thus masking the issue.
> > 
> > Also change from using napi_reschedule to napi_schedule, as the only
> > difference is the presence of a return value which wasn't used here
> > anyway.
> 
> Let's leave that out from this particular patch - fixes should be
> minimal, this sounds like cleanup.

Can do.

> 
> > Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is
> > done")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Co-developed-by: Scott McNutt <scott.mcnutt@xxxxxxxxxxxx>
> > Signed-off-by: Scott McNutt <scott.mcnutt@xxxxxxxxxxxx>
> > Signed-off-by: Robert Hancock <robert.hancock@xxxxxxxxxx>
> > ---
> >  drivers/net/ethernet/cadence/macb_main.c | 26 ++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 98498a76ae16..338660fe1d93 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1573,14 +1573,36 @@ static int macb_poll(struct napi_struct *napi, int
> > budget)
> >  	if (work_done < budget) {
> >  		napi_complete_done(napi, work_done);
> >  
> > -		/* Packets received while interrupts were disabled */
> > +		/* RSR bits only seem to propagate to raise interrupts when
> > +		 * interrupts are enabled at the time, so if bits are already
> > +		 * set due to packets received while interrupts were disabled,
> > +		 * they will not cause another interrupt to be generated when
> > +		 * interrupts are re-enabled.
> > +		 * Check for this case here.
> > +		 */
> >  		status = macb_readl(bp, RSR);
> 
> Which case is more likely - status == 0 or != 0?
> 
> Because MMIO reads are usually expensive so if status is likely 
> to be zero your other suggestion could be lower overhead.
> It'd be good to mention this expectation in the commit message 
> or comment here.

There was some measurement done on this that motivated a previous patch in this
area:

commit 504ad98df3a6b027ce997ca8f620e949cafb151f
Author: Soren Brinkmann <soren.brinkmann@xxxxxxxxxx>
Date:   Sun May 4 15:43:01 2014 -0700

    net: macb: Remove 'unlikely' optimization

    Coverage data suggests that the unlikely case of receiving data while
    the receive handler is running may not be that unlikely.
    Coverage data after running iperf for a while:
        91320:  891:        work_done = bp->macbgem_ops.mog_rx(bp, budget);
        91320:  892:        if (work_done < budget) {
         2362:  893:                napi_complete(napi);
            -:  894:
            -:  895:                /* Packets received while interrupts were
disabled */
         4724:  896:                status = macb_readl(bp, RSR);
         2362:  897:                if (unlikely(status)) {
          762:  898:                        if (bp->caps &
MACB_CAPS_ISR_CLEAR_ON_WRITE)
          762:  899:                                macb_writel(bp, ISR,
MACB_BIT(RCOMP));
            -:  900:                        napi_reschedule(napi);
            -:  901:                } else {
         1600:  902:                        macb_writel(bp, IER,
MACB_RX_INT_FLAGS);
            -:  903:                }
            -:  904:        }

    Signed-off-by: Soren Brinkmann <soren.brinkmann@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

So it looks like the non-zero status case was being hit roughly 1/3 of the
time, at least under that particular workload. It may depend heavily on
workload etc. but doesn't seem to be clear-cut to optimize one way or the
other.

For the new "double check" branch, from adding debug in, it appears that one is
hit on the order of a few dozen times a day under constant load, so the
"unlikely" seems appropriate there.

> 
> >  		if (status) {
> >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> >  				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > -			napi_reschedule(napi);
> > +			napi_schedule(napi);
> >  		} else {
> >  			queue_writel(queue, IER, bp->rx_intr_mask);
> > +
> > +			/* Packets could have been received in the window
> > +			 * between the check above and re-enabling interrupts.
> > +			 * Therefore, a double-check is required to avoid
> > +			 * losing a wakeup. This can potentially race with
> > +			 * the interrupt handler doing the same actions if an
> > +			 * interrupt is raised just after enabling them, but
> > +			 * this should be harmless.
> > +			 */
> > +			status = macb_readl(bp, RSR);
> > +			if (unlikely(status)) {
> > +				queue_writel(queue, IDR, bp->rx_intr_mask);
> > +				if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > +					queue_writel(queue, ISR,
> > MACB_BIT(RCOMP));
> > +				napi_schedule(napi);
> > +			}
> >  		}
> >  	}
> >