Hi, I'm debugging an iptables-restore failure, which happens about 5% of the time when I keep stopping and starting the Linux VM. The VM has only 1 CPU, and kernel version is 4.15.0-1098-azure, but I suspect the issue may also exist in the mainline Linux kernel. When the failure happens, it's always caused by line 27 of the rule file: 1 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021 2 *raw 3 :PREROUTING ACCEPT [0:0] 4 :OUTPUT ACCEPT [0:0] 5 -A PREROUTING ! -s 168.63.129.16/32 -p tcp -j NOTRACK 6 -A OUTPUT ! -d 168.63.129.16/32 -p tcp -j NOTRACK 7 COMMIT 8 # Completed on Fri Apr 23 09:22:59 2021 9 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021 10 *filter 11 :INPUT ACCEPT [2407:79190058] 12 :FORWARD ACCEPT [0:0] 13 :OUTPUT ACCEPT [1648:2190051] 14 -A OUTPUT -d 169.254.169.254/32 -m owner --uid-owner 33 -j DROP 15 COMMIT 16 # Completed on Fri Apr 23 09:22:59 2021 17 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021 18 *security 19 :INPUT ACCEPT [2345:79155398] 20 :FORWARD ACCEPT [0:0] 21 :OUTPUT ACCEPT [1504:2129015] 22 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT 23 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP 24 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT 25 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP 26 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW -j DROP 27 COMMIT The related part of the strace log is: 1 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3 2 getsockopt(3, SOL_IP, IPT_SO_GET_INFO, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) = 0 3 getsockopt(3, SOL_IP, IPT_SO_GET_ENTRIES, "security\0\357B\16Z\177\0\0Pg\355\0\0\0\0\0Pg\355\0\0\0\0\0"..., [880]) = 0 4 setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, "security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2200) = -1 EAGAIN (Resource temporarily unavailable) 5 close(3) = 0 6 write(2, "iptables-restore: line 27 failed"..., 33) = 33 The -EAGAIN error comes from line 1240 of xt_replace_table(): do_ipt_set_ctl do_replace __do_replace xt_replace_table 1216 xt_replace_table(struct xt_table *table, 1217 unsigned int num_counters, 1218 struct xt_table_info *newinfo, 1219 int *error) 1220 { 1221 struct xt_table_info *private; 1222 unsigned int cpu; 1223 int ret; 1224 1225 ret = xt_jumpstack_alloc(newinfo); 1226 if (ret < 0) { 1227 *error = ret; 1228 return NULL; 1229 } 1230 1231 /* Do the substitution. */ 1232 local_bh_disable(); 1233 private = table->private; 1234 1235 /* Check inside lock: is the old number correct? */ 1236 if (num_counters != private->number) { 1237 pr_debug("num_counters != table->private->number (%u/%u)\n", 1238 num_counters, private->number); 1239 local_bh_enable(); 1240 *error = -EAGAIN; 1241 return NULL; 1242 } When the function returns -EAGAIN, the 'num_counters' is 5 while 'private->number' is 6. If I re-run the iptables-restore program upon the failure, the program will succeed. I checked the function xt_replace_table() in the recent mainline kernel and it looks like the function is the same. It looks like there is a race condition between iptables-restore calls getsockopt() to get the number of table entries and iptables call setsockopt() to replace the entries? Looks like some other program is concurrently calling getsockopt()/setsockopt() -- but it looks like this is not the case according to the messages I print via trace_printk() around do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is no other program calling do_replace(); the table entry number was changed to 5 by another program 'iptables' about 1.3 milliseconds ago, and then this program 'iptables-restore' calls setsockopt() and the kernel sees 'num_counters' being 5 and the 'private->number' being 6 (how can this happen??); the next setsockopt() call for the same 'security' table happens in about 1 minute with both the numbers being 6. Can you please shed some light on the issue? Thanks! BTW, iptables does have a retry mechanism for getsockopt(): 2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.") (https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10&ignorews=0&dt=0) But it looks like this is enough? e.g. here getsockopt() returns 0, but setsockopt() returns -EAGAIN. Thanks, Dexuan