usbnet OOM handling - possible CPU starvation?

Bjørn Mork <bjorn@xxxxxxx> · Tue, 18 Mar 2014 10:59:20 +0100

Hello Oliver & other clueful gurus!

I started looking at how to improve the OOM handling in usbnet, but have
to admit that this is way over my head. I don't think I dare touching
it...

I've seen a few reports clearly showing that we do have a problem. One
example is here: http://forums.whirlpool.net.au/forum-replies.cfm?t=2207038&p=57#r1122

Although all the reports I've seen are from users of cdc_ncm, I don't
think this problem is specific to any particualar minidriver. But it is
of course easier to trigger it using a minidriver with (too?)  large
buffers, like the cdc_ncm minidriver family.

To recall: The NCM and MBIM protocols aggregate packets before
transission over the USB link, requiring USB buffers with multiple
ethernet/IP packets.  Common code used by both cdc_mbim and cdc_ncm
currently cap the buffers at 32768 bytes (hard coded limit), which is
still too high for some embedded hosts.  This results in failures to
allocate buffers in rx_submit (flags == GFP_ATOMIC when it is called
from the URB callback):

	skb = __netdev_alloc_skb_ip_align(dev->net, size, flags);
	if (!skb) {
		netif_dbg(dev, rx_err, dev->net, "no rx skb\n");
		usbnet_defer_kevent (dev, EVENT_RX_MEMORY);
		usb_free_urb (urb);
		return -ENOMEM;
	}

The deferred event ends up retrying the same allocation with
flags == GFP_KERNEL :

	/* tasklet could resubmit itself forever if memory is tight */
	if (test_bit (EVENT_RX_MEMORY, &dev->flags)) {
		struct urb	*urb = NULL;
		int resched = 1;

		if (netif_running (dev->net))
			urb = usb_alloc_urb (0, GFP_KERNEL);
		else
			clear_bit (EVENT_RX_MEMORY, &dev->flags);
		if (urb != NULL) {
			clear_bit (EVENT_RX_MEMORY, &dev->flags);
			status = usb_autopm_get_interface(dev->intf);
			if (status < 0) {
				usb_free_urb(urb);
				goto fail_lowmem;
			}
			if (rx_submit (dev, urb, GFP_KERNEL) == -ENOLINK)
				resched = 0;
			usb_autopm_put_interface(dev->intf);
fail_lowmem:
			if (resched)
				tasklet_schedule (&dev->bh);
		}
	}

But what if this still fails with -ENOMEM? The scheduled tasklet then
runs this:

	// or are we maybe short a few urbs?
	} else if (netif_running (dev->net) &&
		   netif_device_present (dev->net) &&
		   netif_carrier_ok(dev->net) &&
		   !timer_pending (&dev->delay) &&
		   !test_bit (EVENT_RX_HALT, &dev->flags)) {
		int	temp = dev->rxq.qlen;

		if (temp < RX_QLEN(dev)) {
			if (rx_alloc_submit(dev, GFP_ATOMIC) == -ENOLINK)
				return;
			if (temp != dev->rxq.qlen)
				netif_dbg(dev, link, dev->net,
					  "rxqlen %d --> %d\n",
					  temp, dev->rxq.qlen);
			if (dev->rxq.qlen < RX_QLEN(dev))
				tasklet_schedule (&dev->bh);
		}
		if (dev->txq.qlen < TX_QLEN (dev))
			netif_wake_queue (dev->net);
	}

which is going to reschedule itself infinitely, without any delays at
all, as long as dev->rxq.qlen < RX_QLEN(dev).  So this depends on
rx_alloc_submit being able to allocate RX_QLEN(dev) buffers, or it is
going to lock up one CPU forever in a tight loop.

But rx_alloc_submit is just going to repeat the rx_submit() call over
and over again, with flags == GFP_ATOMIC:

static int rx_alloc_submit(struct usbnet *dev, gfp_t flags)
{
	struct urb	*urb;
	int		i;
	int		ret = 0;

	/* don't refill the queue all at once */
	for (i = 0; i < 10 && dev->rxq.qlen < RX_QLEN(dev); i++) {
		urb = usb_alloc_urb(0, flags);
		if (urb != NULL) {
			ret = rx_submit(dev, urb, flags);
			if (ret)
				goto err;
		} else {
			ret = -ENOMEM;
			goto err;
		}
	}
err:
	return ret;
}

And quite frankly: That is pretty much guaranteed to fail if we ended up
going through that route in the first place, isn't it?  So rx_submit
tries to defer the allocation again, and we're back where we started.
Except that we are now already rescheduling the tasklet infinitely.

So we are going to spin here forever, unless something magically frees
up some memory for us.  Which of course is much less likely to happen
when we spin like this.  There are users with severely limited OpenWRT
based embedded systems hitting this, with a single CPU core and not much
memory at all...

One visibly symptom of this state, in addition to the CPU spinning, is
that the log will be full of "kevent 2 may have been dropped" messages.
rx_submit tries to defer the allocation every time it fails, and with a
high enough failure rate then schedule_work is called much faster than
the work queue is processed.

But how should this be fixed?  Running a delay timer for each time the
tasklet reschedules might help giving up the CPU.  Maybe we should also
break out at some point, either failing the device or trying to reduce
RX_QLEN(dev)?  If the deferred GFP_KERNEL allocation ever fails, then I
think it's time for the driver to stop and reconsider its strategies.

Another idea I've played is that we could try to dynamically reduce the
rx buffer size.  This might be possible with NCM/MBIM.  The current hard
coded ceiling is arbitrary and could be set at runtime instead.  We
could add a usbnet callback requesting the minidriver to reduce its rx
buffer size.

Help and/or ideas are appreciated.

I will try to add code to cdc_ncm allowing userspace to change the
buffer ceilings at runtime, which should help with the immediate
problem. But it will require user interaction, which of course isn't as
good as an automatic solution.

[1] 

Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html