[1.] One line summary of the problem: write() returns -1 and sets errno non-sensically. 2.4.0{,-ac[23]} [2.] Full description of the problem/report: write() sometimes returns -1 and sets errno to one of 1 (EPERM: "Operation not permitted") 3 (ESRCH: "No such process") 8 (ENOEXEC: "Exec format error") 1 seems to be the most common. I have verified that at least the ESRCH case is caused by the kernel. With the following patch to sys_write(), --- fs/read_write.c~ Sat Nov 11 18:07:38 2000 +++ fs/read_write.c Sun Jan 7 14:00:25 2001 @@ -146,6 +146,8 @@ ssize_t ret; struct file * file; +extern struct file_operations socket_file_ops; + ret = -EBADF; file = fget(fd); if (file) { @@ -165,6 +167,18 @@ DN_MODIFY); fput(file); } + if(ret < 0 && file && file->f_op == &socket_file_ops ) { + switch(-ret) { + case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case ECONNRESET: + case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: case ENOMEM: + case ENOBUFS: case ENOTCONN: + break; + + default: + printk(KERN_ERR "sys_write: ret is unexpectedly %d.\n", ret); + } + } + return ret; } and socket_file_ops made non-static, I eventually got this logged: Jan 7 22:15:29 localhost kernel: sys_write: ret is unexpectedly -3. And this user code: r = write(fd_that_is_a_tcp_socket, ...); if (r > 0) { ... } else if(r == 0) { ... } else if (errno == EAGAIN || errno == EINTR) { ... } else if (errno == EPIPE || errno == ENOSPC || errno == EIO || errno == ECONNRESET || errno == ETIMEDOUT) { ... } else { g_error("node_write: write failed with unexpected errno: %d (%s)\n", errno, g_strerror(errno)); } at the same time produced ** ERROR **: node_write: write failed with unexpected errno: 3 (No such process) aborting... This socket is O_NONBLOCK'd, as well as SO_KEEPALIVE'd and SO_REUSEADDR'd, and was an outgoing connection. TCP ECN is on. Syncookies are compiled in but turned off. This app has previously failed with the same thing for errno's 1 and 8, but without the kernel change to verify that those values are actually coming from the kernel. That had happened on 2.4.0, 2.4.0-ac2, and 2.4.0-ac3. I didn't notice it on 2.4.0-test11-pre2, which was the last kernel I'd been running. But the problem's not particularly deterministic, and I may not have been running it long enough. At least two other people are seeing the same thing on 2.4 kernels. My guess is that something mistakenly thinks it's returning a positive number. [3.] Keywords (i.e., modules, networking, kernel): networking [4.] Kernel version (from /proc/version): Linux version 2.4.0-ac3 (fortytwo@manetheren) (gcc version 2.95.2 20000220 (Debian GNU/Linux)) #4 Sun Jan 7 16:18:26 CST 2001 The machine and kernel are UP. [6.] A small shell script or example program which triggers the problem (if possible) This happens after the app runs usually for several hours with lots of outgoing and incoming TCP connections to lots of different hosts. [7.1.] Software (add the output of the ver_linux script here) Linux manetheren 2.4.0-ac3 #4 Sun Jan 7 16:18:26 CST 2001 i686 unknown Kernel modules 2.3.23 Gnu C 2.95.2 Gnu Make 3.79.1 Binutils 2.10.1.0.2 Linux C Library > libc.2.2 Dynamic linker ldd (GNU libc) 2.2 Procps 2.0.6 Mount 2.10q Net-tools 2.05 Console-tools 0.2.3 Sh-utils 2.0.11 Modules Loaded parport_pc lp parport rtc usbcore [7.2.] Processor information (from /proc/cpuinfo): [7.3.] Module information (from /proc/modules): [7.5.] PCI information ('lspci -vvv' as root) [7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) These doesn't seem particularly relevant, so I'm putting them, as well as .config, at http://manetheren.eigenray.com/~fortytwo/bad_write_info Don't hesitate to ask me for more info or to try a patch or something. [7.7.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): In 2.4.0 and -ac2, I was getting the reset_xmit_timer() messages that -ac3 fixed. I'm also getting TCP peer shrinks window messages, but had none this boot before this error. -- Paul Cassella - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org