SIGSEGV happens over once a day

Richard Yen <richyen@xxxxxxxxxxxxxx> · Thu, 11 May 2006 12:07:04 -0700

Hi all,

I'm experiencing signal 11 (segmentation fault) failures on the  
master node of a 3-node Slony-I cluster.  In the past week, we've  
averaged a little more than one segfault per day (11 times in the  
past 10, including today).  Any ideas what's going on?

Would anyone know how to track this issue?

Don't know if attaching log output might help, but it's very similar  
to the following (the responses to those threads didn't help us,  
though):
http://archives.postgresql.org/pgsql-general/2004-06/msg01204.php
http://www.thescripts.com/forum/thread422225.html

Here's the machine where postgres is faulting:
db1 (Dell 6650):
master Slony-I node
postgreSQL version: 7.4.6
OS: Debian Linux 3.1
CPU: Xeon 4 X 2.5GHz
RAM: 8 GB
DISK:
     / 4 x 18 GB drive: raid 10
     /db/data/base 12 x 36 GB: raid 10
     /db/data/pg_xlog 2 x 73 GB: raid 1

The other two machines don't die, but they're set up pretty much the  
same way.  The only difference is that db2 is running 8.1.3.

So what seems odd to me is that db1 and db3 are pretty much identical  
(db3 has a 1.40GHz Xeon instead of a 2.5GHz, and some RAM  
differences), yet postgres dies all the time on db1, but has yet to  
die on db2 or db3, so I'm guessing maybe it's an UPDATE/INSERT/etc.?

Everything was running fine until last Tuesday, when this happened.   
We've created no new stored procedures, made no changes, or anything  
of the sort.

We've rebooted the db1 machine, but to no avail.  Any other suggestions?

Let me know if you need other info...

Any help would be greatly appreciated!
--Richard