Search Postgresql Archives

Re: Segmentation fault with core dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

(2013/05/09 1:39), Joshua Berry wrote:
| I'm using PG 9.1.9 with a client application using various versions of
the
| pgsqlODBC driver on Windows. Cursors are used heavily, as well as some
pretty
| heavy trigger queries on db writes which update several materialized
views.
|
| The server has 48GB RAM installed, PG is configured for 12GB shared
buffers,
| 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the
other
| settings are defaults.
|
| The server will seg fault from every few days to up to two weeks. Each
time
| one of the postgres server processes seg faults, the server gets
terminated by
| signal 11, restarts in recovery for up to 30 seconds, after which time it
| accepts connections as if nothing ever happened. Unfortunately all the
open
| cursors and connections are lost, so the client apps are left in a bad
state.
|
| Seg faults have also occurred with PG 8.4. ... I migrated the database
to a
| server running PG9.1 with the hopes that the problem would disappear,
but it
| has not. So now I'm starting to debug.
|
| # uname -a
| Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09
CDT 2013
| x86_64 x86_64 x86_64 GNU/Linux
| # cat /etc/redhat-release
| Scientific Linux release 6.3 (Carbon)
|
| # psql -U jberry
| psql (9.1.9)
| Type "help" for help.
|
| jberry=# select version();
|                                                    version
|
-------------------------------------------------------------------------------
|  PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7
|  20120313 (Red Hat 4.4.7-3), 64-bit
| (1 row)

I've had another postmaster segfault on my production server. It appears
to be the same failure as the last one nearly a month ago, but I wanted
to post the gdb bt details in case it helps shed light on the issue.
Please let me know if anyone would like to drill into the dumped core
with greater detail. Both the OS and PG versions remain unchanged.

Kind Regards,
-Joshua


On Fri, Apr 12, 2013 at 6:12 AM, Andres Freund <andres@xxxxxxxxxxxxxxx
<mailto:andres@xxxxxxxxxxxxxxx>> wrote:

    On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
     > I wrote:
     > > (Wanders away wondering just how much the regression tests exercise
     > > holdable cursors.)
     >
     > And the answer is they're not testing this code path at all,
    because if
     > you do
     >       DECLARE c CURSOR WITH HOLD FOR ...
     >       FETCH ALL FROM c;
     > then the second query executes with a portal (and resource owner)
     > created to execute the FETCH command, not directly on the held
    portal.
     >
     > After a little bit of thought I'm not sure it's even possible to
     > reproduce this problem with libpq, because it doesn't expose any
    way to
     > issue a bare protocol Execute command against a pre-existing portal.
     > (I had thought psqlOBC went through libpq, but maybe it's playing
    some
     > games here.)
     >
     > Anyway, I'm thinking the appropriate fix might be like this
     >
     > -             CurrentResourceOwner = portal->resowner;
     > +             if (portal->resowner)
     > +                     CurrentResourceOwner = portal->resowner;
     >
     > in several places in pquery.c; that is, keep using
     > TopTransactionResourceOwner if the portal doesn't have its own.
     >
     > A more general but probably much more invasive solution would be
    to fake
     > up an intermediate portal when pulling data from a held portal, to
     > more closely approximate the explicit-FETCH case.

    We could also allocate a new resowner for the duration of that
    transaction. That would get reassigned to the transactions resowner in
    PreCommit_Portals (after a slight change there).
    That actually seems simple enough?

I made some changes to multi thread handling of psqlodbc driver.
It's also better to fix the crash at backend side.

I made 2 patches.
The 1st one temporarily changes CurrentResourceOwner to CurTransactionResourceOwner during catalog cache handling.
The 2nd one allocates a new resource owner for held portals.
Both fix the crash in my test case.

regards,
Hiroshi Inoue
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index e87e675..258a4c2 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -238,6 +238,7 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 {
 	int16	   *formats = myState->portal->formats;
 	int			i;
+	bool	resownerIsNull	= false;
 
 	/* get rid of any old data */
 	if (myState->myinfo)
@@ -252,6 +253,15 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 	myState->myinfo = (PrinttupAttrInfo *)
 		palloc0(numAttrs * sizeof(PrinttupAttrInfo));
 
+	if (NULL == CurrentResourceOwner)
+	{
+		/*
+		 *	This occurs when processing execute messages
+		 *	for holdable cursors after commit.
+		 */
+		resownerIsNull = true;
+		CurrentResourceOwner = CurTransactionResourceOwner;
+	}
 	for (i = 0; i < numAttrs; i++)
 	{
 		PrinttupAttrInfo *thisState = myState->myinfo + i;
@@ -273,10 +283,16 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 			fmgr_info(thisState->typsend, &thisState->finfo);
 		}
 		else
+		{
+			if (resownerIsNull)
+				CurrentResourceOwner = NULL;
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("unsupported format code: %d", format)));
+		}
 	}
+	if (resownerIsNull)
+		CurrentResourceOwner = NULL;
 }
 
 /* ----------------
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index f8989f7..ccc1c45 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -759,6 +759,11 @@ PortalRun(Portal portal, long count, bool isTopLevel,
 	saveResourceOwner = CurrentResourceOwner;
 	savePortalContext = PortalContext;
 	saveMemoryContext = CurrentMemoryContext;
+	/*
+	 *	Allocate a new ResourceOwner for the portal if not set.
+	 *	It can occur for held portals(committed holdable cursors).
+	 */
+	PortalAllocResourceOwner(portal);
 	PG_TRY();
 	{
 		ActivePortal = portal;
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 54bf16e..5013300 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -183,6 +183,37 @@ PortalListGetPrimaryStmt(List *stmts)
 	return NULL;
 }
 
+static void DelinkResownerFromPortal(ResourceOwner owner, void *arg)
+{
+	Portal	portal = (Portal) arg;
+
+	if (!portal)	return;
+	if (portal->resowner == NULL)
+		elog(DEBUG1, "resowner is already delinked from Portal(%p)", portal);
+	else if (owner != portal->resowner)
+		elog(DEBUG1, "the target ResourceOwner is different from the portal's resowner", portal);
+	else
+	{
+		elog(DEBUG1, "delinking resowner(%p) from Portal(%p)", portal->resowner, portal);
+		portal->resowner = NULL;
+	}
+	UnregisterResownerOnDeleteProc(owner, DelinkResownerFromPortal, arg);
+}
+
+/*
+ *	Allocate a new ResourceOwner for the portal if not set.
+ *	Also register an on delete procedure for the ResourceOwner.	
+ */
+void PortalAllocResourceOwner(Portal portal)
+{
+	if (portal->resowner)
+		return;
+	/* create a ResourceOwner for the portal */
+	portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner, "Portal");
+	/* register an on delete procedure for the ResourceOwner. */
+	RegisterResownerOnDeleteProc(portal->resowner, DelinkResownerFromPortal, portal);
+}
+
 /*
  * CreatePortal
  *		Returns a new portal given a name.
@@ -224,9 +255,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
 										 ALLOCSET_SMALL_INITSIZE,
 										 ALLOCSET_SMALL_MAXSIZE);
 
-	/* create a resource owner for the portal */
-	portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
-										   "Portal");
+	PortalAllocResourceOwner(portal);
 
 	/* initialize portal fields that don't start off zero */
 	portal->status = PORTAL_NEW;
@@ -506,6 +535,8 @@ PortalDrop(Portal portal, bool isTopCommit)
 	/* drop cached plan reference, if any */
 	PortalReleaseCachedPlan(portal);
 
+	/* unregister the on delete procedure because resowner is set to NULL later in this routine */
+	UnregisterResownerOnDeleteProc(portal->resowner, NULL, portal);
 	/*
 	 * Release any resources still attached to the portal.	There are several
 	 * cases being covered here:
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index e7ec393..63930dd 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -98,6 +98,8 @@ typedef struct ResourceOwnerData
 	int			nfiles;			/* number of owned temporary files */
 	File	   *files;			/* dynamically allocated array */
 	int			maxfiles;		/* currently allocated array size */
+	ResourceOwnerDeleteProc	on_delete;
+	void			*on_delete_arg;
 }	ResourceOwnerData;
 
 
@@ -439,6 +441,10 @@ ResourceOwnerDelete(ResourceOwner owner)
 	if (owner->files)
 		pfree(owner->files);
 
+	/* call on delete procedure if exists */
+	if (owner->on_delete != NULL)
+		(*(owner->on_delete)) (owner, owner->on_delete_arg);
+
 	pfree(owner);
 }
 
@@ -537,6 +543,33 @@ UnregisterResourceReleaseCallback(ResourceReleaseCallback callback, void *arg)
 	}
 }
 
+/*
+ * Register or deregister an on delete procedure
+ *
+ */
+void
+RegisterResownerOnDeleteProc(ResourceOwner owner, ResourceOwnerDeleteProc proc, void *arg)
+{
+	if (!owner)
+		return;
+	owner->on_delete = proc;
+	owner->on_delete_arg = arg;
+}
+
+void
+UnregisterResownerOnDeleteProc(ResourceOwner owner, ResourceOwnerDeleteProc proc, void *arg)
+{
+	if (!owner)
+		return;
+	if (arg != owner->on_delete_arg)
+		return;
+	if (NULL != proc &&
+	    proc != owner->on_delete)
+		return;
+	owner->on_delete = NULL;
+	owner->on_delete_arg = NULL;
+}
+
 
 /*
  * Make sure there is room for at least one more entry in a ResourceOwner's
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 4ae7e46..41492c6 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -205,6 +205,7 @@ extern void AtSubAbort_Portals(SubTransactionId mySubid,
 extern void AtSubCleanup_Portals(SubTransactionId mySubid);
 extern Portal CreatePortal(const char *name, bool allowDup, bool dupSilent);
 extern Portal CreateNewPortal(void);
+extern void PortalAllocResourceOwner(Portal portal);
 extern void PinPortal(Portal portal);
 extern void UnpinPortal(Portal portal);
 extern void MarkPortalDone(Portal portal);
diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h
index da76649..9a06d88 100644
--- a/src/include/utils/resowner.h
+++ b/src/include/utils/resowner.h
@@ -58,6 +58,11 @@ typedef void (*ResourceReleaseCallback) (ResourceReleasePhase phase,
 													 bool isTopLevel,
 													 void *arg);
 
+/*
+ *	ResourceOwner delete procedure
+ */
+typedef void (*ResourceOwnerDeleteProc) (ResourceOwner owner, void *arg);
+
 
 /*
  * Functions in resowner.c
@@ -78,5 +83,7 @@ extern void RegisterResourceReleaseCallback(ResourceReleaseCallback callback,
 								void *arg);
 extern void UnregisterResourceReleaseCallback(ResourceReleaseCallback callback,
 								  void *arg);
+extern void RegisterResownerOnDeleteProc(ResourceOwner owner, ResourceOwnerDeleteProc proc, void *arg);
+extern void UnregisterResownerOnDeleteProc(ResourceOwner owner, ResourceOwnerDeleteProc proc, void *arg);
 
 #endif   /* RESOWNER_H */
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux