Hi, it looks like reading from a socket opened with PF_PACKET, SOCK_DGRAM loses data if the system is non-trivially loaded. I have modified the net-accounting daemon net-acct (http://exorsus.net/projects/net-acct/) to use that kind of socket to be able to access the packet type (I am only interested in outgoing packets). The relevant part of my code is included at the end of this article. Until I goofed badly, I think that I can be pretty sure that every packet given to me by recvfrom either gets dropped (and logged or at least counted), or handed down to the handle_ip function, which looks like that everything handed down there eventually ends up in the log file. While testing in a lab setup, this works fine, and the data accounted for is accurate. However, when I put the program on a loaded system, it seems to lose data. Here is a description of my tests: ------------------- | Test machine Q | ------------------- | | ------------------ | Linux router L | ------------------ | | ------------------ | Cisco router C | ------------------ | | ------------------- | Testr machine S | ------------------- (1) For testing, I send exactly 1 MB of data over a tcp connection from Q to S. (2) L runs the modified netacctd, logging packets on both interfaces, and a tcpdump "src host A and tcp port 10000". The tcpdump loggs exactly 728 packets. (3) C is in the setup for reference and has "ip accounting output-packets" set on both interfaces. (4) net-acct running on L logs significantly less data volume between Q and S as the tcpdump and the IP accounting on C does. The amount of data loss seems to be random, starting from just a few packets of the 728, but sometimes amounting to 400 Packets being lost. Even data accounted on both interfaces of L differs. I suspect that data is not polled quickly enough so that a buffer overflows, causing packets not to be seen by my userspace code. Since I am pretty confident that my code either correctly dumps the packet into the accounting log or drops it (generating debugging output in the process), I suspect that the data is being lost on the socket. The system does not seem to be overloaded, it is moving about 5000 packets a second on 4 interfaces; CPU statistics show 10 % user, 20 % system and 70 % idle. Under these circumstances, I think that it should be possible to poll the socket often enough to prevent data from being lost. When I remove the "base load" from the system, leaving only my test traffic (1 MB from Q to S) on the network, all packets are accounted for, making the chance for a programming error in the netacct smaller. This must be some weird kind of buffering problem, I think. Is there a possibility to ask the kernel if data has been lost on the DGRAM socket so that I am able to raise a flag to the operators that they'd better look after the system? Or is my diagnosis wrong and I am losing data somewhere else? As an alternative, I am currently considering using a plugin to ulogd, which will then in turn receive its data from a netfilter chain. Does anybody have experience with this? Thanks for your comments, I really appreciate it. Greetings Marc This is the relevant part of my code (prepare for userspace code ahead ;) ), debugging code and some comments removed: void init_capture() { struct ifreq ifr; struct promisc_device *p; struct protoent *pr; if ((capture_sd = socket (PF_PACKET, SOCK_DGRAM, htons (ETH_P_ALL))) < 0) { syslog(LOG_ERR, "can't get socket: %m\n"); daemon_stop(0); } } void packet_loop() { struct sockaddr_ll saddr_ll; int sizeaddr_ll; unsigned char buff[1600]; unsigned char *buf; int length; static struct iphdr *tmp_iphdr; int dynamicstyle; int do_user; __u32 dynamicaddr, otheraddr; char *user; struct promisc_device *p; int found = 0; struct mon_host_struct *ptr; /* For getting the ifname from ifindex --hillu */ struct ifreq ifr; dynamicstyle = (dev2line != NULL) ? 1 : ((cfg->dynamicip != NULL) ? 2 : 0); buf = &buff[20]; while (running) { sizeaddr_ll = sizeof(struct sockaddr_ll); length = recvfrom (capture_sd, buf, 127, 0, (struct sockaddr *) &saddr_ll, &sizeaddr_ll); if (length == -1) { continue; } /*We capture ETH_P_ALL, only process IP packages. --hillu */ if (ntohs(saddr_ll.sll_protocol) != ETH_P_IP) continue; /* Get devicename from interface index --hillu */ ifr.ifr_ifindex = saddr_ll.sll_ifindex; do_user = 0; /* determine various reasons to ignore the packet, continue if ignore */ /* code removed. debug information is written for each ignored packet */ /* so I know that nothing is ignored here in the test cases if((tmp_iphdr->saddr & cfg->ignoremask) == (tmp_iphdr->daddr & cfg->ignoremask)) { packets->local++; continue; } else { /* snip code for ignoring entire networks. Again, packets ignored */ /* here produce debug output, so no chance of packets */ /* slipping here */ packets->ip++; user = NULL; /* snip code handling dynamic IPs, no possibility to exit here */ handle_ip(buf, ifr.ifr_name, user, saddr_ll.sll_pkttype); } } } The buffer handling (buf and buff, and the 20 index) is broken, but seems to be correct nevertheless. Didn't dare to change it yet. -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Karlsruhe, Germany | lose things." Winona Ryder | Fon: *49 721 966 32 15 Nordisch by Nature | How to make an American Quilt | Fax: *49 721 966 31 29 - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html