Re: [PATCH] nwfilter: increase pcap buffer size to be compatible with TPACKET_V3

Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx> · Thu, 26 Apr 2018 18:20:01 +0200

On Thu, Apr 26, 2018 at 5:55 PM, Laine Stump <laine@xxxxxxxxx> wrote:

  
    
  
  
    On 04/26/2018 04:38 AM, Daniel P.
      Berrangé wrote:

    
    
      On Thu, Apr 26, 2018 at 08:09:47AM +0200, Christian Ehrhardt wrote:

      
        On Wed, Apr 25, 2018 at 11:25 PM, Laine Stump <laine@xxxxxxxxx> wrote:


        
          When an nwfilter rule sets the parameter CTRL_IP_LEARNING to "dhcp",
this turns on the "dhcpsnoop" thread, which uses libpcap to monitor
traffic on the domain's tap device and extract the IP address from the
DHCP response.

If libpcap on the host is built with TPACKET_V3 defined, the dhcpsnoop
code's initialization of the libpcap socket fails with the following
error:

  virNWFilterSnoopDHCPOpen:1134 : internal error: pcap_setfilter: can't
remove kernel filter: Bad file descriptor

It turns out that this was because libpcap with TPACKET_V3 defined
requires a larger buffer size than libvirt was setting (we were
setting it to 128k). Changing the buffer size to 256k eliminates the
error, and the dhcpsnoop thread once again works properly.

Thanks to Christian Ehrhardt <paelzer@xxxxxxxxx> for discovering that
buffer size was the problem.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1547237
Signed-off-by: Laine Stump <laine@xxxxxxxxx>
---
 src/nwfilter/nwfilter_dhcpsnoop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/nwfilter/nwfilter_dhcpsnoop.c b/src/nwfilter/nwfilter_
dhcpsnoop.c
index 6069e70460..62eb617515 100644
--- a/src/nwfilter/nwfilter_dhcpsnoop.c
+++ b/src/nwfilter/nwfilter_dhcpsnoop.c
@@ -259,7 +259,7 @@ struct _virNWFilterDHCPDecodeJob {
  * libpcap 1.5 requires a 128kb buffer
  * 128 kb is bigger than (DHCP_PKT_BURST * PCAP_PBUFSIZE / 2)
  */


        
        I just started building with the change for a few tests on this - no
results yet.

But we are all puzzled/unsure enough on the size that I'd already ask to
modify the comment above to explain the new size.

Maybe we should explain:
- why 128 isn't enough
- why you chose "only" 256
- why the default size might be too big
- your size considerations for many guest scenarios
      
    
    

    Okay, so I looked into this in more detail. I don't know that I'm
    totally at the bottom of it, but this is what I found in libpcap:

    

    * The function create_ring() calls setsockopt(fd, SOL_PACKET,
    PACKET_RX_RING, &req ...) to setup a ring buffer for receiving
    packets.

    

    * one of the attributes of req is tp_frame_size, and another is
    tp_frame_nr. tp_frame_nr is set to the user-supplied (or default)
    buffer size (for us it was 128k) / tp_frame_nr. 

    

    * if TPACKETV3 isn't used, tp_frame_size is set to (roughly, there
    is some aligning done, and a bit additional added for one ethernet
    header) the user-supplied snaplen (in our case 576). So tp_frame_nr
    is going to be something > 200.

    

    * if TPACKET3 *is* used, tp_frame_size is set to a predefined
    constant MAXIMUM_SNAPLEN, which is 262144. This means that
    tp_frame_nr will be 131072/262144, which is 0.

    

    So the result will be that setsockopt() is called requesting a ring
    buffer that can read 0 frames. I didn't trace through the call to
    see exactly what happens, but it obviously can't be good. (Right
    after that, tp_frame_nr is also used in a call to malloc a buffer to
    hold pointers to frame headers, and that of course would also fail
    if we ever got that far. I'm assuming we don't.

    

    Why are the frames in the ring buffer so large for TPACKETV3? The
    code says this:

    

       /* The "frames" for this are actually buffers that

        * contain multiple variable-sized frames.

        *

        * We pick a "frame" size of 128K to leave enough

        * room for at least one reasonably-sized packet

        * in the "frame".

        */

    

    This all explains why changing the buffer size to 256k works for us
    - that gets tp_frame_nr to 1 (just barely!) so there is no
    multiplying by 0. And because multiple packets are received into
    this single "frame", we still operate correctly.

    

    However, this all seems very arbitrary - there's an API for setting
    the size of a buffer, but it's not defined anywhere in the API what
    are the units of measure (that's not an exactly accurate
    description, but I'm too tired to think of the right way to say it).
    pcap-int.h itself even admits this in the comment above its
    definition of MAXIMUM_SNAPLEN:

    

       /*

        * Maximum snapshot length.

        *

        * Somewhat arbitrary, but chosen to be:

        *

        * [blah blah blah details details...]

        */

        #define MAXIMUM_SNAPLEN       262144

    

    

    Also, note that the comment where MAXIMUM_SNAPLEN is used in
    pcap-linux.c says that it is 128k, but the actual definition sets it
    at 256k. :-/

    

    This gives me 0 confidence that any value we choose would continue
    to be viable in the future.

    

    
      
        That will help the next one stumbling over this code.

      
      FWIW, having just checked libpcap git history, the current 2 MB default
size has been there since 2008 !   I'm guessing we don't use that as we
don't want to consume lots of RAM when many guests are running.


    
    

    Yeah, I'm guessing this is why Stefan chose to set the buffer size.

    

    In the case of static IP (the learnIPAddress codepath) we open the
    pcap socket, get a single packet that has an IP address, and then
    close the pcap socket, so we don't normally have more than a few
    around at any time, and the size of the buffer is a non-issue so
    it's left at the default.

    

    For the case of dhcp snooping, though, the handle remains open for
    the entire lifetime of the domain (I guess in case the address is
    released and re-requested and a different address is handed out the
    next time). So if we're using the default 2MB buffer, there would be
    2MB for each client. On one hand, for a system with 100 clients,
    that's 200MB of memory. On the other hand, any system that's
    miniscule in relation to the total amount of memory (it might reduce
    the capacity of a very large host by one guest at most).

    

    (NB: I checked this out by putting VIR_WARNs in the source at
    strategic places, and found that we are actually keeping open *TWO*
    pcap sockets for each domain that has CTRL_IP_LEARNING=dhcp. We
    should probably look into that...)

    

    So if everyone is okay with that extra memory usage, I'd suggest
    Christian could send his original patch (deleting the
    pcap_set_buffer_size()) to the list (but including the comments from
    my patch about the exact error message, Resolves:
    blah.bugzilla.blah, etc).

I can do so if "everyone is okay with that extra memory usage" applies.
So waiting for opinions on that for now.
 
(Or if you don't have time, I can send it
    and credit you in the commit log).

    

    > I
      recently see more and more Resolves: instead of "Fixes:" did we
      change the recommended format for some tools and I missed it?

      

      Who is "we"? :-) The overwhelming majority of these tags in
      libvirt commit logs have always been Resolves:

      

    laine@vhost2 ~/devel/libvirt (master*)>git log | grep
    "Resolves: h" | wc

        508    1017   33340

    laine@vhost2 ~/devel/libvirt (master*)>git log | grep "Fixes: h"
    | wc

          9      18     654
    

    

    I do notice that the few Fixes: are for bugs on launchpad (with one
    exception), and the Resolves: are all bugzilla.redhat.com and a few
    opensuse.org/suse.com

    
  




-- 
Christian EhrhardtSoftware Engineer, Ubuntu Server
Canonical Ltd

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list