RE: calling pam_sm_open_session

"Kelli Wolfe" <kelli@inlet.com> · Mon, 25 Sep 2000 14:15:51 -0500

It figures:  the one time I want a core dump, I can't get one.
I tried both of Michael's suggestions (see #1 below) and neither
produced a core file.  I'm guessing the problem isn't with login.

I'm going to start compiling everything I can think of with
whatever debug options I can find.  In the mean time, can
anyone tell me what the "debug" on the /etc/pam.d/* files
does?  Am I supposed to get error/debug messages somewhere?
Anywhere in particular?  I can't see where that's giving me
anything.

Thanks for any and all ideas,
Kelli

-----Original Message-----
From: mjt@corpit.ru [mailto:mjt@corpit.ru]On Behalf Of Michael Ju.
Tokarev
Sent: Monday, September 25, 2000 12:35 PM
To: Kelli Wolfe
Cc: pam-list@redhat.com
Subject: Re: calling pam_sm_open_session

Kelli Wolfe wrote:
>
> Hello  Michael,
>
> Thank you for your offer of assistance, I'm in need:  that trace file
> is huge.

I told you about that.  But doesn't matter.

> I've attached it.  This is from the machine I'm trying to telnet into.
> There are a few things in this file that I'm unsure about.

> First, there's a 1000+ close(xxxx)  .....  (Bad file descriptor) messages.
This is normal.  Some application (that should be login) tries to close
all possible opened files (should be 1024-3 calls).  The simplest way of
that is just close 'em.

> Next there are several "Connection refused" messages, I'm not sure
> if they have anything to do here or not.

Probably not -- I can guess that the libc tried to connect to local
nscd (name service cache daemon); if it isn't running, libc will
fallback
to do the work itself.

> And lastly, there is a  "--- SIGSEGV (Segmentation fault) ---".

And most interesting really.  Unfortunately, this is what I expected
to see.  My guess was that (pamified) login (or one of pam modules,
or pam itself) just have trouble/bug somewhere that caused SISSEGV or
something similar to occur.  I was just not shure about that, as
telnetd exits with 1 (according to your logs), but that's normal.

Oh, ma, just some trivial bug/typo somewhere...  But this is from
things that are very hard to investigate remotely....

>   ------------------------------------------------------------------------
>                  Name: trc.tar.gz
>    trc.tar.gz    Type: Unix Tape Archive (application/x-tar)
>              Encoding: base64
>                  Size: 17143 bytes

Unfortunately (second time in this letter) I can't read this file:

 $ gzip -t trc.tar.gz
 gzip: trc.tar.gz: invalid compressed data--format violated
 $ _

And the file really looks "strange" at least.  I guess that your
"Microsoft Outlook 8.5, Build 4.71.2377.0" just screwed it up...

You can resend it (maybe using some real mailer instead), but I
think that it is not necessary anymore.  We need to find where
that sigsegv occured, and strace really can't help here.
Unfortunately (3rd time!!!) RedHat strips all binaries, so it
is not really possible to find where problem occurs.

As I guess, you are not a "unix guru" (or at least not a "unix
programmer guru").  But still can suggest you to do the following.

1. Enable (temporary) core dumping from your daemons (redhat by
   default forbids cores from daemons).  You can enable that two
   ways -- commenting out a line with 'ulimit -c 0' in /etc/rc.d/
   /init.d/functions (if I remember correctly), or replacing
   your login by little wrapper (but be very careful, and check
   that 'linux single' lilo command works before doing this):
     rename /bin/login to /bin/login.save
     create /bin/login with the following contents:
       #! /bin/sh
       ulimit -c 100000
       exec /bin/login.save $*
     and make it executable (chmod +x /bin/login)

2. Try to login again.  You should get core now, probably in /.
   Use 'gdb /bin/login /core' and look if it can tell you
   something useful (use "bt" command).  Information needed is
   a name of a module (most important) and function/linenumber of
   a place where that sigsegv occured.

3. If that does not shows anything, you need to recompile some
   packages/libraries used so that them have debugging info.
   And repeat step 2.

All of that is a work for package maintainers/authors (esp. if
user have no big expirience with all that stuff).  But problem
is that we can't tell _what_ package is in bad.  It is definitely
not telnet/telnetd.  It may be login, pam or one of pam modules.

Also, try asking other people -- e.g. behind pam_ldap and/or
login.

Or your can ask someone with programming expirience near you
to help you to do so.

I CC'd this message back to pam-list.

Regards,
 Michael.