Re: [Bug #14141] order 2 page allocation failures in iwlagn

reinette chatre <reinette.chatre@xxxxxxxxx> · Wed, 14 Oct 2009 14:55:17 -0700

On Wed, 2009-10-14 at 14:33 -0700, Frans Pop wrote:
> On Wednesday 14 October 2009, reinette chatre wrote:
> > We do queue the GFP_KERNEL allocations when there are only a few buffers
> > remaining in the queue (8 right now) ...
> 
> Are you sure of this? I have zero messages in my logs about allocation 
> failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers 
> remaining" with GFP_ATOMIC.

That does make sense to me. We do not expect allocations with GFP_KERNEL
to fail. Considering how I understand how things work I am considering
the following scenario:

* start with system low on available memory
* now introduce incoming traffic (causing the RX code to run)
* upon receipt of frame we attempt an allocation (to reclaim the buffer)
with GFP_ATOMIC (state: num RX buffer free > watermark)
  * this fails since memory is not available
  * num RX buffer free reduces
  * does _not_ queue replenishment of buffers with GFP_KERNEL
* repeat above until we hit the watermark (currently 8)
* upon receipt of frame we attempt an allocation (to reclaim the buffer)
with GFP_ATOMIC (state: num RX buffer free <= watermark) 
  * this fails (now user sees big warning)
  * queue replenishment of buffers with GFP_KERNEL

Essentially what I suspect could happen is that
we do attempt to replenish the buffers with GFP_KERNEL after several
failures with GFP_ATOMIC, but at that point we have already run out
completely.

One way to test this theory is to queue the GFP_KERNEL allocation
earlier (when we still have a significant number of RX buffers
available), 8 may turn out to be too small.

> Does that indicate a bug or could they fall under the ratelimit somehow?

In your kernel log I do see that the driver's error messages related to
GFP_ATOMIC are rate limited (we see many more "order-2 allocation
failure" messages than the "Failed to allocate" messages). All of these
allocation failures are from the "replenish_now" code though, which is
GFP_ATOMIC. So even though we do not see the "Failed to allocate" errors
(which are rate limited) it seems that all allocation failures are from
that (the GFP_ATOMIC) code.

Reinette

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html