[cc list trimmed a bit] On Tue, Jul 11, 2000 at 12:50:32PM +0200, Alexander Demenshin wrote: > - Traffic generator used on _local_ interface: > > > A lot of fragmented packets: > > ifconfig lo mtu 256 > ping -f -s 8192 127.0.0.1 > > > A lot of TCP traffic (connect/transfer/disconnect); > > MTU does not matter. > > In my tests I used the following rules for iptables: > > iptables -t mangle -A PREROUTING -j QUEUE > iptables -t mangle -A OUTPUT -j QUEUE > > I assume there are no other rules; but the problem occurs _only_ > when QUEUE target is in effect - other rules does not matter as long > as there is no QUEUE targets or if packets are not accepted in userspace. The only thing I can see in ipqueue is that it turns off local bottom halves for a long time during packet receive. That could probably force other races. > In case if I use table 'filter' it also occurs (so nothing magical > in 'mangle' table). > > So, once rules above are in effect, userspace module is running, and after > certain period of time running traffic generator system lockup occurs > (in my case - after processing of ca. 300K packets; but it depends - > be patient :). > > No OOPs, no other kernel messages, _nothing_ except SysRq is active. > > Examining of code under EIP shows, that lockup occurs at: > > - In case of TCP traffic: > > src/net/ipv4/tcp_timer.c:690 > > --- src/net/ipv4/tcp_timer.c:690 tcp_synack_timer() --- > /* Drop this request */ > write_lock(&tp->syn_wait_lock); /* <<< AT THIS PLACE */ This one is strange. Any chance to get a multi CPU backtrace for this ? (install kdb from oss.sgi.com:/projects/kdb/ , press pause during a hang, enter bt and switch to the other CPUs using the cpu command and backtrace them too) > *reqp = req->dl_next; > write_unlock(&tp->syn_wait_lock); > > --- CUT --- > > - In case of ICMP (fragmented) traffic: > > --- src/net/ipv4/ip_fragment:202 ip_expire --- > spin_lock(&ipfrag_lock); /* <<< AT THIS PLACE */ The fragment locking is known to be buggy. It should be fixed in 2.4.0pre3. Also there was a NAT bug that it called ip_defrag without bhs turned off that could cause deadlocks too, but that should be already fixed (all ip_defrag calls in netfilter/* should be guarded by a local_bh_disable/ enable) -Andi - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.rutgers.edu