Ok. More of an update. We've managed to create a scenario that exhibits the problem much earlier. We can now cause the lockup to occur within a few hours (rather than the 12 to 24 hours in our other scenario). Our setup is to to have a a lot of traffic constantly being processed by the netfilter code. After about 2 hours, any external attempt to read the table entries (such as with getsocktopt and IPT_SO_GET_ENTRIES) triggers the lockup. What is strange is that this does not appear until after a couple of hours of heavy traffic. We cannot trigger this problem in the first hour, rarely in the second hour, and always after the second hour. Now, our original setup did not have as much traffic. But based upon a quick, back of the napkin computation, it seems to occurring after a certain amount of traffic. I can try to get more firm numbers. But this kind of behavior hints less at a race condition between two writers, and is instead somehow dependent upon the amount of traffic. Indeed, my test program only uses IPT_SO_GET_ENTRIES which does not trigger the second path do do_add_counters. So I'm no longer thinking the path through setsockopts is a cause of the problem. So instead, it seems that the only way there could be multiple writers (assuming that is the problem), is if there are multiple contexts through which ipt_do_table() is called. So far, my perusal of the code indicates only through the hooks in each of the iptables modules. And it isn't clear to me how these are called. But it does seem that even with the patch Eric provided (which fixes the seqcount update), there is still a potential problem. If indeed we have multiple contexts executing ipt_do_table(), it is possible for more than just the seqcount to be corrupted. Indeed, it seems that any updates to the internal structures could cause problems. It isn't clear to me if there is anything modified here, other than the counters, so I'm not sure if there are any other issues. But regardless, if the counters could become corrupted, then it is possible to break any rules that use them. Anyway, based on earlier discussion, is there any reason not to use a lock (presuming any solution properly takes into account possible recursion)? I understand that the mainline is protected, but perhaps in the RT version we can use seqlock (and prevent a recursive lock)? Thanks, Pete LaDow -- To unsubscribe from this list: send the line "unsubscribe netfilter" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html