> From: Dexuan Cui > Sent: Wednesday, May 12, 2021 3:17 PM > ... > It looks like there is a race condition between iptables-restore calls > getsockopt() to get the number of table entries and iptables call > setsockopt() to replace the entries? Looks like some other program is > concurrently calling getsockopt()/setsockopt() -- but it looks like this is > not the case according to the messages I print via trace_printk() around > do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is > no other program calling do_replace(); the table entry number was changed > to 5 by another program 'iptables' about 1.3 milliseconds ago, and then > this program 'iptables-restore' calls setsockopt() and the kernel sees > 'num_counters' being 5 and the 'private->number' being 6 (how can this > happen??); the next setsockopt() call for the same 'security' table > happens in about 1 minute with both the numbers being 6. Well, after tracing the code more closely, it turns out that the program 'iptables' indeed concurrently changes the number from 5 to 6, and this causes the 'iptable-restores' program to get EAGAIN: 1. iptables-906 (906 is the process ID) calls IPT_SO_GET_ENTRIES and the current num_entries is 4; the process calls IPT_SO_SET_REPLACE and the private->number becomes 5. 2. iptables-restor-895 calls IPT_SO_GET_ENTRIES, and gets num_entries==5. 3. iptables-906 calls IPT_SO_GET_ENTRIES again, and also gets num_entries==5; the process calls IPT_SO_SET_REPLACE and the private->number becomes 6. 4. iptables-restor-895 calls IPT_SO_SET_REPLACE and the kernel function xt_replace_table() returns -EAGAIN due to num_counters == 5 and private->number == 6. I think the latest mainline kernel should also have the same race. It looks like this by-design race exists since day one? > BTW, iptables does have a retry mechanism for getsockopt(): > 2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.") > (https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10 > &ignorews=0&dt=0) > > But it looks like this is enough? e.g. here getsockopt() returns 0, but > setsockopt() returns -EAGAIN. It looks like we need to add a retry mechanism to the iptables-restore program: iptables/iptables-restore.c: ip46tables_restore_main(): if cb->ops->commit(handle) -> ... -> setsockopt(...,IPT_SO_SET_REPLACE, ...) fails due to EAGAIN, it should start over from the very begining, i.e. call create_handle() -> handle = cb->ops->init(tablename) again to get the new num_entries, and retry the commit op. But I'm not sure how to easily re-create the context associated with the old handle (should/can it re-parse the rules?), as I'm not really familiar with iptables. Or, is it possible to fix the race condition from the netfilter module? Thanks, -- Dexuan