Thanks for the tip. We are using RHEL 6.9 and definitely up to date on glibc (2.12-1.209.el6_9.2). We also have the same versions on a very similar system with no segfault.
My colleague got a better backtrace that shows another extension
Core was generated by `postgres: batch_user_account''.
Program terminated with signal 11, Segmentation fault.
#0 0x000000386712868a in __strcmp_sse42 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install postgresql96-server-9.6.5-
1PGDG.rhel6.x86_64 (gdb) bt
#0 0x000000386712868a in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00007fa3f0c7074c in get_query_string (pstate=<value optimized out>, query=<value optimized out>, jumblequery=<value optimized out>) at pg_hint_plan.c:1882
#2 0x00007fa3f0c70a5d in pg_hint_plan_post_parse_
analyze (pstate=0x25324b8, query=0x25325e8) at pg_hint_plan.c:2875 #3 0x00000000005203bc in parse_analyze ()
#4 0x00000000006df933 in pg_analyze_and_rewrite ()
#5 0x00000000007c6f6b in ?? ()
#6 0x00000000007c6ff0 in CachedPlanGetTargetList ()
#7 0x00000000006e173a in PostgresMain ()
#8 0x00000000006812f5 in PostmasterMain ()
#9 0x0000000000609278 in main ().
We aren’t sure if this indicates that pg_hint_plan is causing the segfault or if it happened to be doing something when the segfault occurred. We aren’t actually using pg_hint_plan hints in this system so we’re not sure how all this relates to segfault when another process does a ‘grant usage on schema abc to user xyz;’ unrelated to the account segfaulting.
Short of better ideas, we will pull the pg_hint_plan extension and see if that removes the problem.
-Blair
From: Peter Geoghegan <pg@xxxxxxx>
Date: Saturday, March 24, 2018 at 4:18 PM
To: Blair Boadway <bboadway@xxxxxxxxxxxx>
Cc: "pgsql-general@xxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxx>
Subject: Re: Troubleshooting a segfault and instance crash
Mar 7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip
000000302f32868a sp 00007ffcf1547498 error 4 in
libc-2.12.so[302f200000+
18a000]
Mar 7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG:
server process (PID 29351) was terminated by signal 11: Segmentation fault
It crashes the database, though it starts again on its own without any
apparent issues. This has happened 3 times in 2 months and each time the
segfault error and memory address is the same.
We had a recent report of a segfault on a Redhat compatible system,
that seemed like it might originate from within its glibc [1].
Although all the versions there didn't match what you have, it's worth
considering as a possibility.
Maybe you can't install debuginfo packages because you don't yet have
the necessary debuginfo repos set up. Just a guess. That is sometimes
a required extra step.
--
Peter Geoghegan