login-hang through mgetty (kernel serial driver bug?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have found a bug while using RedHat 8.0, I think it is in the kernel.
The problem relates to the /dev/ttyS1 driver and behaves like a race
condition because it does not give exactly predictable results.

I have a workaround for the problem but have no idea about doing a real
fix (and haven't bothered looking too seriously I'll admit).

Some info about my system:

----------------------------------------------------------------------
[root@localhost /]# cat /proc/version
Linux version 2.4.18-14 (bhcompile@stripples.devel.redhat.com) (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)) #1 Wed Sep 4 13:35:50 EDT 2002
[root@localhost /]# egrep 'tty|Serial' /var/log/dmesg
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
[root@localhost /]# grep mget /etc/inittab
mo:2345:respawn:/sbin/mgetty -D ttyS1
[root@localhost /]# rpm -q mgetty
mgetty-1.1.28-9
[root@localhost /]# rpm -q util-linux
util-linux-2.11r-10
[root@localhost /]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.53GHz
stepping        : 7
cpu MHz         : 2539.135
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 5036.33
[root@localhost /]# tail /etc/mgetty+sendfax/mgetty.config
#  direct y
#  speed 19200
#  toggle-dtr n

port ttyS0
   init-chat "" AT&F OK AT&C1 OK AT&D3 OK AT&K3 OK AT&S0 OK

port ttyS1
   init-chat "" AT&F OK AT&C1 OK AT&D3 OK AT&K3 OK AT&S0 OK

[root@localhost /]# lspci
00:00.0 Host bridge: Intel Corp. 82845G/GL [Brookdale-G] Chipset Host Bridge (rev 02)
00:01.0 PCI bridge: Intel Corp. 82845G/GL [Brookdale-G] Chipset AGP Bridge (rev 02)
00:1d.0 USB Controller: Intel Corp. 82801DB USB (Hub #1) (rev 02)
00:1d.1 USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 02)
00:1d.2 USB Controller: Intel Corp. 82801DB USB (Hub #3) (rev 02)
00:1d.7 USB Controller: Intel Corp. 82801DB USB EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 82)
00:1f.0 ISA bridge: Intel Corp. 82801DB ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801DB ICH4 IDE (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB AC'97 Audio (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation RIVA TNT2 Model 64 (rev 15)
02:05.0 Ethernet controller: Broadcom Corporation: Unknown device 4401 (rev 01)
[root@localhost /]# minicom
ATI                                                                                    
56000                                                                                  

OK
ATI1
255

OK
ATI2
OK
ATI3
V1.200-K56_DLS M1200B

OK
ATI4
a007840284C6002F

bC60000000

r1005111151012000

r3000111170000000

OK
ATI5
022

OK
ATI6
RCV56DPF L8570A Rev 45.0/45.0
OK
ATI7
000

OK
ATI8
Maestro Jetstream 56 Voice Modem

OK
ATI9
ERROR
----------------------------------------------------------------------

Contact me if you need to know more... the system is going to get
delivered in a few days but now that I've got the modem working I 
expect to still have access.

OK, now to describe the problem. What I want to do is run mgetty on a
serial port to a modem so that I can login to this machine over the
phone.  These are bog-standard serial ports and what I'm doing should
just work by now, this is a really basic thing.

What happens is that *SOMETIMES* it does work perfectly, about one in
three or maybe slightly less often. I can't seem to see a pattern to
when it works and when it doesn't but at any rate, the number of
failures is significant.  The failures have nothing to do with the
modems failing to connect or bad phone lines of any of that because I
always get a connect and I always get a login banner, this is never a
problem. When it does fail, it will just hang after I type my user
name. The mgetty process correctly invokes the /bin/login and I can
see it in the ps list, after waiting for a while (maybe a minute,
not a long while) the login process seems to swap out to disc and
that's were it stays until killed. Once in this state, lsof can't
detect that the /dev/ttyS0 is open (which is annoying).


[root@localhost /]# ps aux -w -w | grep login
root      6333  0.0  0.0  1572  408 ttyS1    S    12:08   0:00 login    
[root@localhost /]# cat /proc/6333/maps
08048000-0804d000 r-xp 00000000 03:02 225895     /bin/login
0804d000-0804e000 rw-p 00004000 03:02 225895     /bin/login
0804e000-0804f000 rwxp 00000000 00:00 0
40000000-40012000 r-xp 00000000 03:02 224579     /lib/ld-2.2.93.so
40012000-40013000 rw-p 00012000 03:02 224579     /lib/ld-2.2.93.so
4001b000-40020000 r-xp 00000000 03:02 224590     /lib/libcrypt-2.2.93.so
40020000-40021000 rw-p 00004000 03:02 224590     /lib/libcrypt-2.2.93.so
40021000-40049000 rw-p 00000000 00:00 0
40049000-40050000 r-xp 00000000 03:02 225695     /lib/libpam.so.0.75
40050000-40051000 rw-p 00006000 03:02 225695     /lib/libpam.so.0.75
40051000-40053000 r-xp 00000000 03:02 224592     /lib/libdl-2.2.93.so
40053000-40054000 rw-p 00001000 03:02 224592     /lib/libdl-2.2.93.so
40054000-40056000 r-xp 00000000 03:02 225696     /lib/libpam_misc.so.0.75
40056000-40057000 rw-p 00001000 03:02 225696     /lib/libpam_misc.so.0.75
42000000-42126000 r-xp 00000000 03:02 144438     /lib/i686/libc-2.2.93.so
42126000-4212b000 rw-p 00126000 03:02 144438     /lib/i686/libc-2.2.93.so
4212b000-4212f000 rw-p 00000000 00:00 0
bfffc000-c0000000 rwxp ffffd000 00:00 0
[root@localhost /]# ls -l /proc/6333/fd
total 0
lrwx------    1 root     root           64 Feb 13 12:10 0 -> /dev/ttyS1
lrwx------    1 root     root           64 Feb 13 12:10 1 -> /dev/ttyS1
lrwx------    1 root     root           64 Feb 13 12:10 2 -> /dev/ttyS1
[root@localhost /]# ps aux -w -w | grep login
root      6333  0.0  0.0  1572  408 ttyS1    S    12:08   0:00 login    
[root@localhost /]# ps aux -w -w | grep login
root      6333  0.0  0.0  1572  408 ttyS1    S    12:08   0:00 login    
[root@localhost /]# ps aux -w -w | grep login
root      6333  0.0  0.0     0    0 ttyS1    SW   12:08   0:00 [login]
[root@localhost /]# cat /proc/6333/maps
[root@localhost /]# ls -l /proc/6333/fd
total 0
[root@localhost /]# cat  /proc/6333/status
Name:   login
State:  S (sleeping)
Tgid:   6333
Pid:    6333
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 0
Groups:
SigPnd: 0000000000000000
SigBlk: 0000000000002000
SigIgn: 8000000000000006
SigCgt: 0000000000002000
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
[root@localhost /]# 

At the other end of the modem, I'm looking at something like this:

----------------------------------------------------------------------
CONNECT 9600
Red Hat Linux release 8.0 (Psyche)
Kernel 2.4.18-14 on an i686


localhost.localdomain login: ccc
NO CARRIER
----------------------------------------------------------------------

Note that the "NO CARRIER" only turns up after the [login] gets killed.
Trying the same thing many times over, I can sometimes get things like:

----------------------------------------------------------------------
CONNECT 9600
Red Hat Linux release 8.0 (Psyche)
Kernel 2.4.18-14 on an i686


main loginlocaldomain login: ccc
          Password: 
Last login: Thu Feb 13 11:35:00 on ttyS1
----------------------------------------------------------------------

This looks like a broken buffer or possibly a bad write() call.
At other times I will actually get a normal login without a problem.

Trying "strace -o /tmp/trc /sbin/mgetty -D ttyS1" from the command
line will correctly give me a trace of the login process and weirdly
it *ALWAYS* works when I'm running through strace from the command
line -- I get a perfect login every time.

I added debugging fprintf()s to the login.c source and compiled my
own /bin/login, using this method I was able to figure out where it
got to before it hangs:

----------------------------------------------------------------------
    {
        struct termios tt, ttt;
        
        tcgetattr(0, &tt);
        ttt = tt;
        ttt.c_cflag &= ~HUPCL;

        /* These can fail, e.g. with ttyn on a read-only filesystem */
        chown(ttyn, 0, 0);
        chmod(ttyn, TTY_MODE);

        /* Kill processes left on this tty */
        tcsetattr(0,TCSAFLUSH,&ttt); // <==== FAILS HERE, NEVER RETURNS
        signal(SIGHUP, SIG_IGN); /* so vhangup() wont kill us */
        vhangup();
        signal(SIGHUP, SIG_DFL);

        /* open stdin,stdout,stderr to the tty */
        opentty(ttyn);
        
        /* restore tty modes */
        tcsetattr(0,TCSAFLUSH,&tt);
    }
----------------------------------------------------------------------

OK: here is my workaround, change one line:

        tcsetattr(0,TCSAFLUSH,&ttt);

to

        tcsetattr(0,TCSANOW,&ttt);

in the above code block from login.c, this will force that system call
to correctly return. It will have the unfortunate result that the end of line
character is often (but not always) lost during a modem login (but still
perfectly OK from a console login). And you get a result like this:


----------------------------------------------------------------------
CONNECT 9600
Red Hat Linux release 8.0 (Psyche)
Kernel 2.4.18-14 on an i686


localhost.localdomain login: cccPassword: 
Last login: Thu Feb 13 11:33:33 on ttyS1
----------------------------------------------------------------------


Admittedly not ideal but at least it gets you in the door.
Hopefully by now you can see why the finger points towards the kernel
serial driver, especially when the occasional buffering weirdness
pops up. Good luck to whoever wants to come up with a proper fix.

	- Tel



-- 
Psyche-list mailing list
Psyche-list@redhat.com
https://listman.redhat.com/mailman/listinfo/psyche-list

[Index of Archives]     [Fedora General Discussion]     [Red Hat General Discussion]     [Centos]     [Kernel]     [Red Hat Install]     [Red Hat Watch]     [Red Hat Development]     [Red Hat 9]     [Gimp]     [Yosemite News]

  Powered by Linux