Symptom: After down-ing an interface (lo and eth0 tested) using ifconfig, further attempts to down the interface with ifconfig will hang for all interfaces. Context switches jump noticeably and ifconfig sits waiting on an ioctl(). This occurs with kernel 2.4.23-pre3. Kernel 2.4.23-pre2 does not exhibit this behavior. I've come across one or two others who have experienced this with pre3 on lkml but there was not much information there to go on so I thought I'd try the net list. > uname -a Linux wolf 2.4.23-pre3 #1 Thu Sep 4 16:54:12 PDT 2003 i686 unknown > ifconfig --version net-tools 1.60 ifconfig 1.42 (2001-04-13) Reproduce: ifconfig lo up ifconfig lo down ifconfig lo up ifconfig lo down (ifconfig is now hung for all interfaces) (this occurs for eth0 as well) ------------- When this occurs, ifconfig sits here: execve("/sbin/ifconfig", ["ifconfig", "lo", "down"], [/* 33 vars */]) = 0 brk(0) = 0x8055914 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=52547, ...}) = 0 old_mmap(NULL, 52547, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40015000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0h\222\1"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=5029105, ...}) = 0 old_mmap(NULL, 1191168, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40022000 mprotect(0x4013b000, 40192, PROT_NONE) = 0 old_mmap(0x4013b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x119000) = 0x4013b000 old_mmap(0x40141000, 15616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40141000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40145000 munmap(0x40015000, 52547) = 0 brk(0) = 0x8055914 brk(0x805593c) = 0x805593c brk(0x8056000) = 0x8056000 uname({sys="Linux", node="wolf", ...}) = 0 access("/proc/net", R_OK) = 0 access("/proc/net/unix", R_OK) = 0 socket(PF_UNIX, SOCK_DGRAM, 0) = 3 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 access("/proc/net/if_inet6", R_OK) = -1 ENOENT (No such file or directory) access("/proc/net/ax25", R_OK) = -1 ENOENT (No such file or directory) access("/proc/net/nr", R_OK) = -1 ENOENT (No such file or directory) access("/proc/net/ipx", R_OK) = -1 ENOENT (No such file or directory) access("/proc/net/appletalk", R_OK) = -1 ENOENT (No such file or directory) access("/proc/net/x25", R_OK) = -1 ENOENT (No such file or directory) ioctl(4, 0x8913, 0xbffff76c) = 0 ioctl(4, 0x8914 <unfinished ...> ------------- When lo is working: > cat /proc/net/sockstat sockets: used 2 TCP: inuse 0 orphan 0 tw 0 alloc 0 mem 0 UDP: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 When lo is hung: > cat /proc/net/sockstat sockets: used 4 TCP: inuse 0 orphan 0 tw 0 alloc 0 mem 0 UDP: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 ------------- Other notes: When ifconfig hangs, context switches on this box in single-user mode go from single digits to a steady 200+. Speculation: (aka: I'm not very familiar with the network code, so this may be a completely inaccurate assessment.) As the problem seems to have been introduced in 2.4.23-pre3, I thought I'd look at the patch to see if anything jumped out. The only thing that jumps out at me is that in net/core/dev.c a while (test_bit(...) has been replaced with a call to a common function which instead does while (test_and_set_bit(...). [from patch-2.4.22-pre2-pre3.tar.bz2] +++ linux-2.4.23-pre3/include/linux/netdevice.h 2003-09-03 15:18:12.000000000 -0 @@ -802,6 +802,38 @@ local_irq_restore(flags); } +static inline void netif_poll_disable(struct net_device *dev) +{ + while (test_and_set_bit(__LINK_STATE_RX_SCHED, &dev->state)) { + /* No hurry. */ + current->state = TASK_INTERRUPTIBLE; + schedule_timeout(1); + } +} +++ linux-2.4.23-pre3/net/core/dev.c 2003-09-03 15:18:13.000000000 -0700 @@ -851,11 +851,7 @@ * engine, but this requires more changes in devices. */ smp_mb__after_clear_bit(); /* Commit netif_running(). */ - while (test_bit(__LINK_STATE_RX_SCHED, &dev->state)) { - /* No hurry. */ - current->state = TASK_INTERRUPTIBLE; - schedule_timeout(1); - } + netif_poll_disable(dev); Placing the original while loop back into net/core/dev.c seems to clear up the problem but that doesn't seem like a proper fix so I thought I'd post what I've seen in trying to track this down so those in the know might have a look. -- Michael G. Janicki <mjanicki@chartconnect.com> - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html