Re: Random Internal Server Error 500 after apache and system update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I really see nothing in that strace output that tells anything is crashing.



2016-10-02 10:44 GMT+02:00 Fabio F.Gervasi <fabio.gervasi70@xxxxxxxxx>:
Hi James,

I do not have the skills to be able to work well in "depth" to the problem as you (rightly) suggest. What I know is that this problem started from the last system update (apache / php / mariadb, ...) from 23 September.

The only log info is: /var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): socket reached timeout in lingering-close state

and I don't know if strace (the results of which I have just quoted) is useful..

Should I wait for further apache / php updates?



Fabio F. Gervasi
Engineering Consultant

Web             www.centrometeo.com
g+                 plus.google.com/centrometeo
Twitter        twitter.com/centrometeo









2016-10-02 10:34 GMT+02:00 Dr James Smith <js5@xxxxxxxxxxxx>:
Things you can do are trying to track down if there is any pattern - are these first request on a child, "nth" request on  child etc... we add additional variables to access logs which include things such as PID/request no in PID, memory usage before and after etc. This allows us sometimes to see patterns in errors which aren't normally easy to see..
Saying that it may not help you here...

Another thing you can do with ab is check to see if it is OK under lower load?

Finally I find "siege" better than "ab" when it comes to testing as it is more configurable - you can send it a list of URLs - run test for "n-seconds" rather than "n-requests" which is more useful if you don't know how long things take to return!

James


On 02/10/2016 09:02, Daniel wrote:
I would bet on your method of parsing php and the scripts being ran, in any case to be sure:

Try the strace approach:
strace -o /tmp/outputfile -s 5000 httpd -X or strace -ff -F -s200 -o /tmp/strace.out -p PID

to try to find out what's really going on. 

2016-09-30 20:19 GMT+02:00 Fabio F.Gervasi <fabio.gervasi70@xxxxxxxxx>:
Hi!

I have additional information:

/var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): socket reached timeout in lingering-close state

What do you think?



2016-09-30 14:47 GMT+02:00 Fabio F.Gervasi <fabio.gervasi70@xxxxxxxxx>:
Hi!

Thank you for your reply. I tried the following tests.

1) Using a little text file:
# ab -k -c 100 -n 2000 localhost/test.txt
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests


Server Software:        Apache/2.4.23
Server Hostname:        localhost
Server Port:            80

Document Path:          /test.txt
Document Length:        52 bytes

Concurrency Level:      100
Time taken for tests:   2.521 seconds
Complete requests:      2000
Failed requests:        16
   (Connect: 0, Receive: 0, Length: 16, Exceptions: 0)
Keep-Alive requests:    1984
Total transferred:      950336 bytes
HTML transferred:       103168 bytes
Requests per second:    793.45 [#/sec] (mean)
Time per request:       126.032 [ms] (mean)
Time per request:       1.260 [ms] (mean, across all concurrent requests)
Transfer rate:          368.18 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   3.8      0      21
Processing:     1  123 126.5     83     772
Waiting:        0  123 126.7     83     772
Total:          1  124 126.6     85     772

Percentage of the requests served within a certain time (ms)
  50%     85
  66%    154
  75%    201
  80%    229
  90%    303
  95%    373
  98%    452
  99%    547
 100%    772 (longest request)

I obtain 16 failed requests, but if I run more again I obtain a different number.

2) Using a big gif file:
# ab -k -c 100 -n 2000 localhost/it-vis-animation.gif
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests


Server Software:        Apache/2.4.23
Server Hostname:        localhost
Server Port:            80

Document Path:          /it-vis-animation.gif
Document Length:        8105309 bytes

Concurrency Level:      100
Time taken for tests:   26.294 seconds
Complete requests:      2000
Failed requests:        0
Keep-Alive requests:    2000
Total transferred:      16211484000 bytes
HTML transferred:       16210618000 bytes
Requests per second:    76.06 [#/sec] (mean)
Time per request:       1314.691 [ms] (mean)
Time per request:       13.147 [ms] (mean, across all concurrent requests)
Transfer rate:          602100.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   4.0      0      22
Processing:   404 1309  80.0   1298    1635
Waiting:        5   22  32.4     10     207
Total:        420 1310  80.6   1298    1635

Percentage of the requests served within a certain time (ms)
  50%   1298
  66%   1315
  75%   1342
  80%   1360
  90%   1397
  95%   1440
  98%   1513
  99%   1530
 100%   1635 (longest request)

In this case I obtain always 0 failed!

3) Using a little png file:
# ab -k -c 100 -n 2000 localhost/apple-touch-icon.png
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests


Server Software:        Apache/2.4.23
Server Hostname:        localhost
Server Port:            80

Document Path:          /apple-touch-icon.png
Document Length:        12364 bytes

Concurrency Level:      100
Time taken for tests:   2.576 seconds
Complete requests:      2000
Failed requests:        22
   (Connect: 0, Receive: 0, Length: 22, Exceptions: 0)
Keep-Alive requests:    1978
Total transferred:      25304554 bytes
HTML transferred:       24455992 bytes
Requests per second:    776.52 [#/sec] (mean)
Time per request:       128.779 [ms] (mean)
Time per request:       1.288 [ms] (mean, across all concurrent requests)
Transfer rate:          9594.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   3.9      0      20
Processing:     0  125 112.0    101     563
Waiting:        0  125 112.1    101     563
Total:          0  126 112.6    102     563

Percentage of the requests served within a certain time (ms)
  50%    102
  66%    159
  75%    196
  80%    220
  90%    282
  95%    354
  98%    401
  99%    457
 100%    563 (longest request)

I get 22 failed requests, but if I run more again I obtain a different number.

What do you thing?

Thank you!



2016-09-30 13:50 GMT+02:00 Daniel <dferradal@xxxxxxxxx>:
hello,

the fact that all those 500 error reponses go with "X-Powered-By: PHP/5.6.26" tell me it is php giving those errors.

If you are still unsure, you could use the ab tool from Apache HTTPD, and request a specific static file (text, gif) hundreds of times to see if you get a 500 error from HTTPD.

2016-09-30 8:59 GMT+02:00 Fabio F.Gervasi <fabio.gervasi70@xxxxxxxxx>:
Hi,

I'm not 100% sure that the real source of the problem is apache (could be php or other), but I'd like start from here as the only logs that provides me with information (with trace4 level) is apache log. No information appears from other logs.

I have:
- Linux 4.7.4-100.fc23.x86_64
- Server version: Apache/2.4.23 (Fedora)
- mysql  Ver 15.1 Distrib 10.0.26-MariaDB, for Linux (x86_64) using readline 5.1
- PHP Version => 5.6.25

The problem: random internal server error 500. Random because, for example, if a page produces the error and do "reload" then it works. The same page may not work as well as work. Sometimes you can use the site for several minutes with no problems, then suddenly there is the error.

When error occurs, this is all that I get from the log file (with loglevel = trace4):

[...]
[Fri Sep 30 07:41:39.151052 2016] [rewrite:trace3] [pid 17957:tid 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] 151.25.206.200 - - [www.centrometeo.com/sid#55c3e3673400][rid#7f8c8000e9d0/subreq] [perdir /home/web/centrometeo.com/] applying pattern '.*' to uri 'index.php', referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.151092 2016] [rewrite:trace1] [pid 17957:tid 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] 151.25.206.200 - - [www.centrometeo.com/sid#55c3e3673400][rid#7f8c8000e9d0/subreq] [perdir /home/web/centrometeo.com/] pass through /home/web/centrometeo.com/index.php, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913003 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(567): [client 151.25.206.200:38368] Headers from script 'php.fcgi':, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913097 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   Status: 500 Internal Server Error, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913130 2016] [core:trace1] [pid 17957:tid 140241771534080] util_script.c(649): [client 151.25.206.200:38368] Status line from script 'php.fcgi': 500 Internal Server Error, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913162 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913191 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM", referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913218 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   Content-Encoding: gzip, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913246 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   Vary: Accept-Encoding, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913275 2016] [core:trace4] [pid 17957:tid 140241771534080] util_script.c(568): [client 151.25.206.200:38368]   Content-type: text/html; charset=UTF-8, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913395 2016] [http:trace3] [pid 17957:tid 140241771534080] http_filters.c(1006): [client 151.25.206.200:38368] Response sent with status 500, headers:, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913432 2016] [http:trace4] [pid 17957:tid 140241771534080] http_filters.c(835): [client 151.25.206.200:38368]   X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913458 2016] [http:trace4] [pid 17957:tid 140241771534080] http_filters.c(835): [client 151.25.206.200:38368]   P3P: CP=\\"NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM\\", referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[Fri Sep 30 07:41:39.913481 2016] [http:trace4] [pid 17957:tid 140241771534080] http_filters.c(835): [client 151.25.206.200:38368]   Content-Encoding: gzip, referer: http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita
[...]

Note the use of fcgi. Apache is indeed configured with event/php-fpm/FastCGI, but the same thing happens with "normal" prefork / php!
Until a few days ago it did not happen, it started after updating the system (kernel, apache, php, mariadb).
I do not really know what to do.

Thank you very much.
Fabio.



--
Daniel Ferradal
IT Specialist

email         dferradal at gmail.com





--
Daniel Ferradal
IT Specialist

email         dferradal at gmail.com


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.




--
Daniel Ferradal
IT Specialist

email         dferradal at gmail.com

[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux