I wrote:
Hi all,
yesterday my netfilter client suddenly froze. I wasn't even able to kill it.
The following is a strace:
recvmsg(4, 0x7fffffffa770, 0) = ? ERESTARTSYS (To be restarted)
strace was displaying just "recvmsg(4,", until the signal was caught. The
buffer address and flags have been added at thet time. I think strace didn't
write more info because no msg had been received.
The relevan line of code is
while (caught_signal == 0 &&
(rv = recv(fd, buf, sizeof(buf), 0)) >= 0)
This was wrong. recv is translated to recvfrom in the C library. recvmsg is
apparently called from the nfnetlink library. I downloaded some sources
(independently of the installed libraries --still lenny.) They show such
calls in nfnl_talk
while (1) {
status = recvmsg(nfnlh->fd, &msg, 0);
if (status < 0) {
if (errno == EINTR)
continue;
and in nfnl_listen
while (! quit) {
remain = recvmsg(nfnlh->fd, &msg, 0);
if (remain < 0) {
if (errno == EINTR)
continue;
(This may explain why I had to use SIGKILL.)
(It seems both these functions are now deprecated. I must have an older
library.)
I'd guess the program froze in the nfnl_listen function, since nfnl_talk is
only used for queue configuration. However, I haven't been able to find
where nfnl_listen is being referenced from. Can anybody shed some light on
how that works?
At any rate, the most plausible diagnosis I can think of is that messages
were not being sent to user space any more. After I killed the daemon, I
just launched it again. It worked. I haven't rebooted the box since, so the
problem seems likely to be in user space. A reasonable question is whether
it is common practice to destroy the queues and the handle and rebuild
everything from scratch every now an then. Would that help?
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html