Re: Postgres 8.4.20 seqfault on RHEL 6.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 13, 2015 at 2:38 PM, Dave Johansen <davejohansen@xxxxxxxxx> wrote:
On Thu, Feb 12, 2015 at 4:33 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Dave Johansen <davejohansen@xxxxxxxxx> writes:
> I'm running Postgres 8.4.20 on RHEL 6.4 and it will occasionally crash. The
> postgres.log file just says that a PID was terminated. The output from
> dmesg has a message like this one:
> postmaster[22905]: segfault at 686 ip 0000000000000686 sp 00007fff83d72e88
> error 14 in postgres[400000+463000]

> What can I do to try and figure out what is causing the crash and fix it?

(1) install relevant postgresql-debuginfo package (assuming we're talking
about a Red Hat-originated postgres package)

(2) run postmaster under "ulimit -c unlimited" (easiest way is probably
to add such a command to /etc/rc.d/init.d/postgresql and restart the
service)

(3) wait for crash

(4) gdb the resulting corefile (should be under your $PGDATA directory)

(5) send in a stack trace.

Here's the stacktrace from gdb (if it matters, the package version from RHEL is postgresql-8.4.18-1.el6_4.x86_64):
#0  0x0000000000000686 in ?? ()
#1  0x00007f76ae551801 in ?? ()
#2  0x00000000019f7793 in ?? ()
#3  0x00007fff06ad6be0 in ?? ()
#4  0x00007fff06ad6be0 in ?? ()
#5  0x0000000000545e35 in ExecMakeFunctionResult (fcache=0x19f5680, econtext=0x19f37e8, isNull=0x19f7793 "", isDone=0x19f7b8c) at execQual.c:1870
#6  0x0000000000541096 in ExecTargetList (projInfo=<value optimized out>, isDone=0x7fff06ad704c) as execQual.c:5212
#7  ExecProject (projeInfo=<value optimized out>, isDone=0xfff06ad704c) as execQual.c:5427
#8  0x0000000000553c5b in ExecResult (node=0x1999a68) at nodeResult.c:155
#9  0x00000000005406c8 in ExecProcNode (node=0x1999a68) at execProcnode.c:344
#10 0x000000000053e942 in ExecutePlan (queryDesc=0x1990c60, direction=<value optimized out>, count=0) as execMain.c:1542
#11 0xstandardExecutorRun (queryDesc=0x1990c60, direction=<value optimized out>, count=0) as execMain.c:310
... (I can include the rest, if it's needed)

Any insight?
Thanks,
Dave

So from looking at the stacktrace it looked like the issue was happening in one of our C functions. I did some digging and what had happened was the permissions on the folder that has those functions had been set wide open, so whenever someone built our software it overwrote the .so files. Normally, it's a process that's only done by the postgres when a new "version" is rolled out, but that check was being overwritten because of the incorrect permissions.

So that brings up a different question that I will start a new thread for.

Thanks for the help,
Dave

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux