[tracked] libpcap problem: recvfrom does not work as expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi *, 

On Mon, Jul 10, 2000 at 06:59:46PM -0700, Guy Harris wrote:
 
[Linux kernel hiding the original packet size]
> I put in an "fprintf()" to log the value of "packet_len" and "caplen",
> and ran "tcpdump" without any "-s" flag, and with a "-w" flag so that
> the "fprintf()" messages wouldn't get mixed with packet printout
> messages, and it said:
> 
> 	tcpdump: listening on eth0
> 	packet_len 60, caplen 60
> 	packet_len 82, caplen 68
> 	packet_len 86, caplen 68

[...] 

> Hugh was running what I presume was a beta version of RH 6.2 (it had a
> 6.1.something version number), with a 2.2.15 kernel; what kernel are you
> using?  Perhaps something broke between 2.2.14 and 2.2.15 (I'll check
> those two versions when I go home).

I think it's time to lift the cover of this problem :)

I read a big chunk of kernel code and tried to understand the data flow of 
a packet trough the kernel code. Correct me if I am wrong...

Path of a packet through the kernel
-----------------------------------

  1. Whenever a packet is sent or received at a network interface the 
     skbuff structure is cloned (in net/core/dev.c) and presented to 
     all protocol hooks.
  2. As the packet socket code installs such a hook the packet is 
     delivered to packet_rcv (net/packet/af_packet.c) which uses 
     sock_queue_rcv_skb to insert it into the receive queue of the 
     packet socket. 
  3. sock_queue_rcv_skb checks if there is enough space in the receive
     buffer of the socket, feeds the packet trough any kernel filter 
     connected to that socket, makes the socket the owner of the packet
     and add it at the end of the receive queue.
  4. At some time recvmsg is called on the socket and gets the packet 
     from the receive queue

What's going wrong here?
------------------------

  The problem which hit libpcap is, that we got exactly as much 
  bytes as our buffer can hold. MSG_TRUNC is never set on return and
  this way there are a number of error messages generated by tcpdump
  (truncated ip). I checked the packet_recvmsg function, the libc6 source
  and libpcap but was unable to find the problem. 

  Only after following the packet through the kernel I noticed that 
  the packet is trimmed when it is enqueued. How can the kernel know the
  size of our receive buffer? Simple: It is in fact related to the kernel
  filter. The BPF code generated by tcpdump returns the snapshot size
  if the packet should be accepted - even tcpdump without a filter generates
  the following BPF code: 

    # tcpdump -d
    (000) ret      #68

  Nice. So the kernel code trims the packet and recvfrom has no idea what 
  the original packet size was. This is also the reason why it sometimes
  worked. If you had no kernel filter configured the problem just disappears.

How to fix
----------

  I think the kernel is at fault here because there is no way to get 
  the original packet size after the packet went through the filter. 
  But it is not easy to fix at the kernel level. If we set MSG_TRUNC 
  whenever the packet was truncated by the filter and always return 
  skb->realsize (the original packet size) there is a case we can't
  handle. Imagine the original packet had 1400 bytes, the filter 
  cuts that down to 300 bytes and we have a buffer of 500 bytes. 
  What's the right return code for recvfrom in that case?

  We want to know the original packet size (1400) and should set MSG_TRUNC. 
  But in that case the userspace does not know that only 300 bytes of the
  buffer are valid. Of course we can copy 500 bytes anyway since the 
  packet is still there but is that correct behaviour?

  I plan to change the pcap generated BPF code when installing the filter. 
  The easy solution is to just change any RET #xx codes to return a 
  negative value so that the kernel does not touch the packet. It's kind
  of a hack though and I do not like hacks ;)

I would appreciate any comments. 

Thanks

    Torsten

-- 
Torsten Landschoff           Bluehorn@IRC               <torsten@debian.org>
           Debian Developer and Quality Assurance Committee Member

Attachment: pgp00000.pgp
Description: PGP signature


[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux