Hi *, On Mon, Jul 10, 2000 at 06:59:46PM -0700, Guy Harris wrote: [Linux kernel hiding the original packet size] > I put in an "fprintf()" to log the value of "packet_len" and "caplen", > and ran "tcpdump" without any "-s" flag, and with a "-w" flag so that > the "fprintf()" messages wouldn't get mixed with packet printout > messages, and it said: > > tcpdump: listening on eth0 > packet_len 60, caplen 60 > packet_len 82, caplen 68 > packet_len 86, caplen 68 [...] > Hugh was running what I presume was a beta version of RH 6.2 (it had a > 6.1.something version number), with a 2.2.15 kernel; what kernel are you > using? Perhaps something broke between 2.2.14 and 2.2.15 (I'll check > those two versions when I go home). I think it's time to lift the cover of this problem :) I read a big chunk of kernel code and tried to understand the data flow of a packet trough the kernel code. Correct me if I am wrong... Path of a packet through the kernel ----------------------------------- 1. Whenever a packet is sent or received at a network interface the skbuff structure is cloned (in net/core/dev.c) and presented to all protocol hooks. 2. As the packet socket code installs such a hook the packet is delivered to packet_rcv (net/packet/af_packet.c) which uses sock_queue_rcv_skb to insert it into the receive queue of the packet socket. 3. sock_queue_rcv_skb checks if there is enough space in the receive buffer of the socket, feeds the packet trough any kernel filter connected to that socket, makes the socket the owner of the packet and add it at the end of the receive queue. 4. At some time recvmsg is called on the socket and gets the packet from the receive queue What's going wrong here? ------------------------ The problem which hit libpcap is, that we got exactly as much bytes as our buffer can hold. MSG_TRUNC is never set on return and this way there are a number of error messages generated by tcpdump (truncated ip). I checked the packet_recvmsg function, the libc6 source and libpcap but was unable to find the problem. Only after following the packet through the kernel I noticed that the packet is trimmed when it is enqueued. How can the kernel know the size of our receive buffer? Simple: It is in fact related to the kernel filter. The BPF code generated by tcpdump returns the snapshot size if the packet should be accepted - even tcpdump without a filter generates the following BPF code: # tcpdump -d (000) ret #68 Nice. So the kernel code trims the packet and recvfrom has no idea what the original packet size was. This is also the reason why it sometimes worked. If you had no kernel filter configured the problem just disappears. How to fix ---------- I think the kernel is at fault here because there is no way to get the original packet size after the packet went through the filter. But it is not easy to fix at the kernel level. If we set MSG_TRUNC whenever the packet was truncated by the filter and always return skb->realsize (the original packet size) there is a case we can't handle. Imagine the original packet had 1400 bytes, the filter cuts that down to 300 bytes and we have a buffer of 500 bytes. What's the right return code for recvfrom in that case? We want to know the original packet size (1400) and should set MSG_TRUNC. But in that case the userspace does not know that only 300 bytes of the buffer are valid. Of course we can copy 500 bytes anyway since the packet is still there but is that correct behaviour? I plan to change the pcap generated BPF code when installing the filter. The easy solution is to just change any RET #xx codes to return a negative value so that the kernel does not touch the packet. It's kind of a hack though and I do not like hacks ;) I would appreciate any comments. Thanks Torsten -- Torsten Landschoff Bluehorn@IRC <torsten@debian.org> Debian Developer and Quality Assurance Committee Member
Attachment:
pgp00000.pgp
Description: PGP signature