It figures: the one time I want a core dump, I can't get one. I tried both of Michael's suggestions (see #1 below) and neither produced a core file. I'm guessing the problem isn't with login. I'm going to start compiling everything I can think of with whatever debug options I can find. In the mean time, can anyone tell me what the "debug" on the /etc/pam.d/* files does? Am I supposed to get error/debug messages somewhere? Anywhere in particular? I can't see where that's giving me anything. Thanks for any and all ideas, Kelli -----Original Message----- From: mjt@corpit.ru [mailto:mjt@corpit.ru]On Behalf Of Michael Ju. Tokarev Sent: Monday, September 25, 2000 12:35 PM To: Kelli Wolfe Cc: pam-list@redhat.com Subject: Re: calling pam_sm_open_session Kelli Wolfe wrote: > > Hello Michael, > > Thank you for your offer of assistance, I'm in need: that trace file > is huge. I told you about that. But doesn't matter. > I've attached it. This is from the machine I'm trying to telnet into. > There are a few things in this file that I'm unsure about. > First, there's a 1000+ close(xxxx) ..... (Bad file descriptor) messages. This is normal. Some application (that should be login) tries to close all possible opened files (should be 1024-3 calls). The simplest way of that is just close 'em. > Next there are several "Connection refused" messages, I'm not sure > if they have anything to do here or not. Probably not -- I can guess that the libc tried to connect to local nscd (name service cache daemon); if it isn't running, libc will fallback to do the work itself. > And lastly, there is a "--- SIGSEGV (Segmentation fault) ---". And most interesting really. Unfortunately, this is what I expected to see. My guess was that (pamified) login (or one of pam modules, or pam itself) just have trouble/bug somewhere that caused SISSEGV or something similar to occur. I was just not shure about that, as telnetd exits with 1 (according to your logs), but that's normal. Oh, ma, just some trivial bug/typo somewhere... But this is from things that are very hard to investigate remotely.... > ------------------------------------------------------------------------ > Name: trc.tar.gz > trc.tar.gz Type: Unix Tape Archive (application/x-tar) > Encoding: base64 > Size: 17143 bytes Unfortunately (second time in this letter) I can't read this file: $ gzip -t trc.tar.gz gzip: trc.tar.gz: invalid compressed data--format violated $ _ And the file really looks "strange" at least. I guess that your "Microsoft Outlook 8.5, Build 4.71.2377.0" just screwed it up... You can resend it (maybe using some real mailer instead), but I think that it is not necessary anymore. We need to find where that sigsegv occured, and strace really can't help here. Unfortunately (3rd time!!!) RedHat strips all binaries, so it is not really possible to find where problem occurs. As I guess, you are not a "unix guru" (or at least not a "unix programmer guru"). But still can suggest you to do the following. 1. Enable (temporary) core dumping from your daemons (redhat by default forbids cores from daemons). You can enable that two ways -- commenting out a line with 'ulimit -c 0' in /etc/rc.d/ /init.d/functions (if I remember correctly), or replacing your login by little wrapper (but be very careful, and check that 'linux single' lilo command works before doing this): rename /bin/login to /bin/login.save create /bin/login with the following contents: #! /bin/sh ulimit -c 100000 exec /bin/login.save $* and make it executable (chmod +x /bin/login) 2. Try to login again. You should get core now, probably in /. Use 'gdb /bin/login /core' and look if it can tell you something useful (use "bt" command). Information needed is a name of a module (most important) and function/linenumber of a place where that sigsegv occured. 3. If that does not shows anything, you need to recompile some packages/libraries used so that them have debugging info. And repeat step 2. All of that is a work for package maintainers/authors (esp. if user have no big expirience with all that stuff). But problem is that we can't tell _what_ package is in bad. It is definitely not telnet/telnetd. It may be login, pam or one of pam modules. Also, try asking other people -- e.g. behind pam_ldap and/or login. Or your can ask someone with programming expirience near you to help you to do so. I CC'd this message back to pam-list. Regards, Michael.