On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote: > On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote: > > > On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote: > > > > We need to drop the refcnt of page when we fail to allocate an skb for frag > > > > list, otherwise it will be leaked. The bug was introduced by commit > > > > 2613af0ed18a11d5c566a81f9a6510b73180660a ("virtio_net: migrate mergeable rx > > > > buffers to page frag allocators"). > > > > > > > > Cc: Michael Dalton <mwdalton@xxxxxxxxxx> > > > > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > > > > Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx> > > > > Cc: Michael S. Tsirkin <mst@xxxxxxxxxx> > > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx> > > > > --- > > > > The patch was needed for 3.12 stable. > > > > > > Good catch, but if we return from receive_mergeable() in the 'middle' > > > of the frags we would need for the current skb, who will > > > call the virtqueue_get_buf() to flush the remaining frags ? > > > > > > Don't we also need to call virtqueue_get_buf() like > > > > > > while (--num_buf) { > > > buf = virtqueue_get_buf(rq->vq, &len); > > > if (!buf) > > > break; > > > put_page(virt_to_head_page(buf)); > > > } > > > > > > ? > > > > > > > > > > > > Let me explain what worries me in your suggestion: > > > > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); > > if (unlikely(!nskb)) { > > head_skb->dev->stats.rx_dropped++; > > return -ENOMEM; > > } > > > > is this the failure case we are talking about? > > I thought Jason patch was about this, no ? > > > > > I think this is a symprom of a larger problem > > introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a, > > namely that we now need to allocate memory in the > > middle of processing a packet. > > > > > > I think discarding a completely valid and well-formed > > packet from the receive queue because we are unable > > to allocate new memory with GFP_ATOMIC > > for future packets is not a good idea. > > How is it different with NIC processing in RX path ? Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a it didn't drop packets received from host as far as I can tell. virtio is more like a pipe than a real NIC in this respect. > > > > It certainly violates the principle of least surprize: > > when one sees host pass packet to guest, one expects > > the packet to get into the networking stack, not get > > dropped by the driver internally. > > Guest stack can do with the packet what it sees fit. > > > > We actually wake up a thread if we can't fill up the queue, > > that will fill it up in GFP_KERNEL context. > > > > So I think we should find a way to pre-allocate if necessary and avoid > > error paths where allocating new memory is a required to avoid drops. > > > > Really, under ATOMIC context, there is no way you can avoid dropping > packets if you cannot allocate memory. If you cannot allocate sk_buff > (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the > payload of next packets anyway. that's why we do: if (!try_fill_recv(rq, GFP_ATOMIC)) schedule_delayed_work(&vi->refill, 0); the queues are large enough for a single failure not to be an immediate problem. > Same problem on a real NIC. > > Under memory pressure we _do_ packet drops. > Nobody really complained. > > Sure, you can add yet another cache of pre-allocated skbs and pay the > price of managing yet another cache layer, but still need to trop > packets under stress. We don't need a cache even. Just enough to avoid dropping packets if allocation failed in the middle so we don't dequeue a buffer and then drop it. Once we use this reserved skb, we stop processing the queue until refill gives it back. > Pre-allocating skb on real NIC has a performance cost, because we clear > sk_buff way ahead of time. By the time skb is finally received, cpu has > to bring back into its cache memory cache lines. > Alternatively we can pre-allocate the memory but avoid clearing it maybe? -- MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization