I have found a bug while using RedHat 8.0, I think it is in the kernel. The problem relates to the /dev/ttyS1 driver and behaves like a race condition because it does not give exactly predictable results. I have a workaround for the problem but have no idea about doing a real fix (and haven't bothered looking too seriously I'll admit). Some info about my system: ---------------------------------------------------------------------- [root@localhost /]# cat /proc/version Linux version 2.4.18-14 (bhcompile@stripples.devel.redhat.com) (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)) #1 Wed Sep 4 13:35:50 EDT 2002 [root@localhost /]# egrep 'tty|Serial' /var/log/dmesg Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS0 at 0x03f8 (irq = 4) is a 16550A ttyS1 at 0x02f8 (irq = 3) is a 16550A [root@localhost /]# grep mget /etc/inittab mo:2345:respawn:/sbin/mgetty -D ttyS1 [root@localhost /]# rpm -q mgetty mgetty-1.1.28-9 [root@localhost /]# rpm -q util-linux util-linux-2.11r-10 [root@localhost /]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.53GHz stepping : 7 cpu MHz : 2539.135 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 5036.33 [root@localhost /]# tail /etc/mgetty+sendfax/mgetty.config # direct y # speed 19200 # toggle-dtr n port ttyS0 init-chat "" AT&F OK AT&C1 OK AT&D3 OK AT&K3 OK AT&S0 OK port ttyS1 init-chat "" AT&F OK AT&C1 OK AT&D3 OK AT&K3 OK AT&S0 OK [root@localhost /]# lspci 00:00.0 Host bridge: Intel Corp. 82845G/GL [Brookdale-G] Chipset Host Bridge (rev 02) 00:01.0 PCI bridge: Intel Corp. 82845G/GL [Brookdale-G] Chipset AGP Bridge (rev 02) 00:1d.0 USB Controller: Intel Corp. 82801DB USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801DB USB (Hub #3) (rev 02) 00:1d.7 USB Controller: Intel Corp. 82801DB USB EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 82) 00:1f.0 ISA bridge: Intel Corp. 82801DB ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801DB ICH4 IDE (rev 02) 00:1f.5 Multimedia audio controller: Intel Corp. 82801DB AC'97 Audio (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation RIVA TNT2 Model 64 (rev 15) 02:05.0 Ethernet controller: Broadcom Corporation: Unknown device 4401 (rev 01) [root@localhost /]# minicom ATI 56000 OK ATI1 255 OK ATI2 OK ATI3 V1.200-K56_DLS M1200B OK ATI4 a007840284C6002F bC60000000 r1005111151012000 r3000111170000000 OK ATI5 022 OK ATI6 RCV56DPF L8570A Rev 45.0/45.0 OK ATI7 000 OK ATI8 Maestro Jetstream 56 Voice Modem OK ATI9 ERROR ---------------------------------------------------------------------- Contact me if you need to know more... the system is going to get delivered in a few days but now that I've got the modem working I expect to still have access. OK, now to describe the problem. What I want to do is run mgetty on a serial port to a modem so that I can login to this machine over the phone. These are bog-standard serial ports and what I'm doing should just work by now, this is a really basic thing. What happens is that *SOMETIMES* it does work perfectly, about one in three or maybe slightly less often. I can't seem to see a pattern to when it works and when it doesn't but at any rate, the number of failures is significant. The failures have nothing to do with the modems failing to connect or bad phone lines of any of that because I always get a connect and I always get a login banner, this is never a problem. When it does fail, it will just hang after I type my user name. The mgetty process correctly invokes the /bin/login and I can see it in the ps list, after waiting for a while (maybe a minute, not a long while) the login process seems to swap out to disc and that's were it stays until killed. Once in this state, lsof can't detect that the /dev/ttyS0 is open (which is annoying). [root@localhost /]# ps aux -w -w | grep login root 6333 0.0 0.0 1572 408 ttyS1 S 12:08 0:00 login [root@localhost /]# cat /proc/6333/maps 08048000-0804d000 r-xp 00000000 03:02 225895 /bin/login 0804d000-0804e000 rw-p 00004000 03:02 225895 /bin/login 0804e000-0804f000 rwxp 00000000 00:00 0 40000000-40012000 r-xp 00000000 03:02 224579 /lib/ld-2.2.93.so 40012000-40013000 rw-p 00012000 03:02 224579 /lib/ld-2.2.93.so 4001b000-40020000 r-xp 00000000 03:02 224590 /lib/libcrypt-2.2.93.so 40020000-40021000 rw-p 00004000 03:02 224590 /lib/libcrypt-2.2.93.so 40021000-40049000 rw-p 00000000 00:00 0 40049000-40050000 r-xp 00000000 03:02 225695 /lib/libpam.so.0.75 40050000-40051000 rw-p 00006000 03:02 225695 /lib/libpam.so.0.75 40051000-40053000 r-xp 00000000 03:02 224592 /lib/libdl-2.2.93.so 40053000-40054000 rw-p 00001000 03:02 224592 /lib/libdl-2.2.93.so 40054000-40056000 r-xp 00000000 03:02 225696 /lib/libpam_misc.so.0.75 40056000-40057000 rw-p 00001000 03:02 225696 /lib/libpam_misc.so.0.75 42000000-42126000 r-xp 00000000 03:02 144438 /lib/i686/libc-2.2.93.so 42126000-4212b000 rw-p 00126000 03:02 144438 /lib/i686/libc-2.2.93.so 4212b000-4212f000 rw-p 00000000 00:00 0 bfffc000-c0000000 rwxp ffffd000 00:00 0 [root@localhost /]# ls -l /proc/6333/fd total 0 lrwx------ 1 root root 64 Feb 13 12:10 0 -> /dev/ttyS1 lrwx------ 1 root root 64 Feb 13 12:10 1 -> /dev/ttyS1 lrwx------ 1 root root 64 Feb 13 12:10 2 -> /dev/ttyS1 [root@localhost /]# ps aux -w -w | grep login root 6333 0.0 0.0 1572 408 ttyS1 S 12:08 0:00 login [root@localhost /]# ps aux -w -w | grep login root 6333 0.0 0.0 1572 408 ttyS1 S 12:08 0:00 login [root@localhost /]# ps aux -w -w | grep login root 6333 0.0 0.0 0 0 ttyS1 SW 12:08 0:00 [login] [root@localhost /]# cat /proc/6333/maps [root@localhost /]# ls -l /proc/6333/fd total 0 [root@localhost /]# cat /proc/6333/status Name: login State: S (sleeping) Tgid: 6333 Pid: 6333 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 0 Groups: SigPnd: 0000000000000000 SigBlk: 0000000000002000 SigIgn: 8000000000000006 SigCgt: 0000000000002000 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff [root@localhost /]# At the other end of the modem, I'm looking at something like this: ---------------------------------------------------------------------- CONNECT 9600 Red Hat Linux release 8.0 (Psyche) Kernel 2.4.18-14 on an i686 localhost.localdomain login: ccc NO CARRIER ---------------------------------------------------------------------- Note that the "NO CARRIER" only turns up after the [login] gets killed. Trying the same thing many times over, I can sometimes get things like: ---------------------------------------------------------------------- CONNECT 9600 Red Hat Linux release 8.0 (Psyche) Kernel 2.4.18-14 on an i686 main loginlocaldomain login: ccc Password: Last login: Thu Feb 13 11:35:00 on ttyS1 ---------------------------------------------------------------------- This looks like a broken buffer or possibly a bad write() call. At other times I will actually get a normal login without a problem. Trying "strace -o /tmp/trc /sbin/mgetty -D ttyS1" from the command line will correctly give me a trace of the login process and weirdly it *ALWAYS* works when I'm running through strace from the command line -- I get a perfect login every time. I added debugging fprintf()s to the login.c source and compiled my own /bin/login, using this method I was able to figure out where it got to before it hangs: ---------------------------------------------------------------------- { struct termios tt, ttt; tcgetattr(0, &tt); ttt = tt; ttt.c_cflag &= ~HUPCL; /* These can fail, e.g. with ttyn on a read-only filesystem */ chown(ttyn, 0, 0); chmod(ttyn, TTY_MODE); /* Kill processes left on this tty */ tcsetattr(0,TCSAFLUSH,&ttt); // <==== FAILS HERE, NEVER RETURNS signal(SIGHUP, SIG_IGN); /* so vhangup() wont kill us */ vhangup(); signal(SIGHUP, SIG_DFL); /* open stdin,stdout,stderr to the tty */ opentty(ttyn); /* restore tty modes */ tcsetattr(0,TCSAFLUSH,&tt); } ---------------------------------------------------------------------- OK: here is my workaround, change one line: tcsetattr(0,TCSAFLUSH,&ttt); to tcsetattr(0,TCSANOW,&ttt); in the above code block from login.c, this will force that system call to correctly return. It will have the unfortunate result that the end of line character is often (but not always) lost during a modem login (but still perfectly OK from a console login). And you get a result like this: ---------------------------------------------------------------------- CONNECT 9600 Red Hat Linux release 8.0 (Psyche) Kernel 2.4.18-14 on an i686 localhost.localdomain login: cccPassword: Last login: Thu Feb 13 11:33:33 on ttyS1 ---------------------------------------------------------------------- Admittedly not ideal but at least it gets you in the door. Hopefully by now you can see why the finger points towards the kernel serial driver, especially when the occasional buffering weirdness pops up. Good luck to whoever wants to come up with a proper fix. - Tel -- Psyche-list mailing list Psyche-list@redhat.com https://listman.redhat.com/mailman/listinfo/psyche-list