Re: Strategies for debugging a segmentation fault on a production ser ver

"Richard Lynch" <ceo@xxxxxxxxx> · Wed, 22 Jun 2005 19:17:44 -0700 (PDT)

On Tue, June 21, 2005 7:57 am, Michael Caplan said:
> I am looking for some advice on how to go about debugging Apache 1.3.33 /
> PHP 5.0.4 on a production Linux box (RHE 3).  The scenario is this:  Once
> a
> day we find a segfault in our apache logs.  From our current position, we
> don't know what page was accessed, and our 400+ users haven't brought the
> issue to our attention.  All we know is the date/time and PID of when the
> segfault occurred.  The question is this:  how can we go about isolating
> the
> offending requested page that bombs?

Have you managed to get the same segfault on a development box?...

Obviously, if you can make it happen on a dev box, you can then set up the
conditions with Apache -X and whatnot to debug to your heart's content.

Focus on reproducing the bug under laboratory conditions.

> I've set up a custom apache log file that populates each entry with the
> PID
> that handled it.  However, when we do see a segfault, this  log file does
> not appear to be populated with an entry that corresponds (within a 5 - 10
> second period) to the PID that bombs.  I'm guessing that the log file is
> only written to after a request is delivered?

Maybe you could compare access_log to error_log.

access_log tells you what they asked for.

error_log tells you what they didn't get...

> Otherwise, looking at the PHP bugs page, it recommends rebuilding PHP with
> -enable-debug and running Apache with -X in order to get a core dump.
> Running apache with debug mode on is not an option on our production box.
> Is running apache -X mandatory to get a core dump?
>
> Any other strategies that you can recommend that would help us isolate the
> offending page so we can get to the good work of reproducing and fixing
> the
> problem at hand?

For something this rare, as I said above, try to focus on making it happen
on a dev box.

In the sort term, you might be able to have the children serve fewer
requests before commiting suicide, which might be worse for load, but also
might avoid the segfaulting as often.  Tough balancing act.  And will only
help if the segfault is somehow related to how long a child has been
running, which might not be the case at all.

I think you could also temporarily set up your logs to log MORE stuff --
perhaps even enough that you can compare access to error and make a
one-to-one comparison of what was requested/delivered.

It will chew up disk space something terrible, slow down the server a fair
amount, but it might be feasible for a production box just long enough to
get the data you need to pin down the segfault.

If the segfault is hardware related, though, knowing which script it
occurred in won't help in the least.  A bad spot in RAM or on the hard
drive in /tmp won't be triggered by any particular script.

PS I'm no expert. Something I typed above could be balderdash.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php