Re: Logical decoding CPU-bound w/ large number of tables

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 10 May 2017 17:02:11 -0400

Mathieu Fenniak <mathieu.fenniak@xxxxxxxxxxxx> writes:
> Andres, it seems like the problem is independent of having large data
> manipulations mixed with schema changes.  The below test case demonstrates
> it with just schema changes.

> Tom, I've tested your patch, and it seems to have a positive impact for
> sure.  I've documented a test case to reproduce this issue (below), and
> your avoid-search-in-catcache-invalidate.patch reduces the test case time
> from 63 seconds per run to 27 seconds per run.

Thanks for the test case.  Your machine is apparently a bit faster than
mine, or else Assert overhead is really hurting in this case, because
on a debug-enabled build of HEAD I see the replication time as about
115 seconds, with the principal culprits according to "perf" being

+   53.11%    52.96%        253596  postmaster       postgres                                      [.] CatalogCacheIdInvalidate
+   31.15%    31.06%        148779  postmaster       postgres                                      [.] hash_seq_search
+    5.70%     5.69%         27223  postmaster       postgres                                      [.] CallSyscacheCallbacks
+    3.36%     3.35%         16057  postmaster       postgres                                      [.] LocalExecuteInvalidationMessage
+    1.32%     1.32%          6306  postmaster       postgres                                      [.] ReorderBufferCommit

I concur that the previously-posted patch to avoid searching in
CatalogCacheIdInvalidate makes for about a 2X speedup; for me,
that brings the test case down to 55 seconds, and now the top
hits are

+   59.28%    59.05%        150349  postmaster       postgres                                      [.] hash_seq_search
+    8.40%     8.36%         21250  postmaster       postgres                                      [.] CallSyscacheCallbacks
+    6.37%     6.34%         16115  postmaster       postgres                                      [.] LocalExecuteInvalidationMessage
+    5.69%     5.67%         14418  postmaster       postgres                                      [.] CatCacheInvalidate
+    3.14%     3.12%          7942  postmaster       postgres                                      [.] SysCacheInvalidate
+    1.72%     1.71%          4354  postmaster       postgres                                      [.] ReorderBufferCommit
+    1.37%     1.33%          3512  postmaster       postgres                                      [.] hash_search_with_hash_value
+    1.15%     1.15%          2913  postmaster       postgres                                      [.] InvalidateCatalogSnapshot

I looked at the hash_seq_search time a bit more, and realized that
actually there's a pretty easy fix for that, which is to reduce
the initial size of RelfilenodeMapHash from 1024 entries to 64.
You could quibble about where to set that exactly, but 1024 is
just way too many --- in your example, there are never more than
5 entries in the hash, despite the presence of 10000 tables in
the database.  I also checked it while running the regression tests,
and while a few transactions get up to several hundred entries,
they're mostly less than 50.  So rather than build a second index
structure for that hashtable, I propose we just do what's in the first
attached patch.

With that, I was down to about 21s run time, and now perf says

+   23.17%    23.08%         21254  postmaster       postgres                                        [.] CallSyscacheCallbacks
+   16.97%    16.91%         15579  postmaster       postgres                                        [.] LocalExecuteInvalidationMessage
+   16.10%    16.03%         14766  postmaster       postgres                                        [.] CatCacheInvalidate
+   12.45%    12.42%         11474  postmaster       postgres                                        [.] hash_seq_search
+    8.66%     8.64%          7959  postmaster       postgres                                        [.] SysCacheInvalidate
+    4.72%     4.70%          4331  postmaster       postgres                                        [.] ReorderBufferCommit
+    3.16%     3.14%          2894  postmaster       postgres                                        [.] InvalidateCatalogSnapshot
+    2.56%     2.50%          2344  postmaster       postgres                                        [.] hash_search_with_hash_value
+    1.27%     1.27%          1169  postmaster       postgres                                        [.] RelfilenodeMapInvalidateCallbac

Looking at CallSyscacheCallbacks, it's got exactly the same
disease of linearly scanning a list to find relevant entries,
when we could easily pre-index the list.  The second patch
attached gets me to 13s, with profile

+   22.78%    22.70%         14568  postmaster       postgres                                     [.] CatCacheInvalidate
+   17.75%    17.69%         11355  postmaster       postgres                                     [.] LocalExecuteInvalidationMessage
+   17.11%    17.04%         10997  postmaster       postgres                                     [.] hash_seq_search
+    7.25%     7.22%          4634  postmaster       postgres                                     [.] SysCacheInvalidate
+    5.05%     5.03%          3229  postmaster       postgres                                     [.] CallSyscacheCallbacks
+    4.09%     4.08%          2615  postmaster       postgres                                     [.] ReorderBufferCommit
+    3.56%     3.55%          2276  postmaster       postgres                                     [.] InvalidateCatalogSnapshot
+    3.51%     3.40%          2240  postmaster       postgres                                     [.] hash_search_with_hash_value
+    1.76%     1.75%          1122  postmaster       postgres                                     [.] RelfilenodeMapInvalidateCallback

We're at a point of diminishing returns here; I think any further
improvement would require reducing the number of invalidation calls,
as Andres was speculating about upthread.  Still, this shows that
it doesn't take very much work to get a 10X improvement in the
overhead associated with inval activities.  We've never seen this
overhead stick out quite this much before, and maybe logical replication
will always be an atypical workload, but I think this may be worth
committing even if Andres does managed to cut the number of calls.

It would be interesting to see how much these patches help for your real
use-case, as opposed to this toy example.  Assuming that the results are
positive, I would advocate for back-patching these changes as far as 9.4
where logical decoding came in.

BTW, I also noticed that we're getting scarily close to exceeding
MAX_SYSCACHE_CALLBACKS.  There are 30-some calls to
CacheRegisterSyscacheCallback in our code, and while I think not all of
them can be reached in a single process, we demonstrably get as high as 21
registered callbacks in some regression test runs.  That's not leaving a
lot of daylight for add-on modules.  The second patch attached includes
increasing MAX_SYSCACHE_CALLBACKS from 32 to 64.  I think we'd be well
advised to apply and back-patch that, even if we don't use the rest of
the patch.

			regards, tom lane

diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c
index c790309..612f0f3 100644
--- a/src/backend/utils/cache/relfilenodemap.c
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -123,7 +123,7 @@ InitializeRelfilenodeMap(void)
 	 * error.
 	 */
 	RelfilenodeMapHash =
-		hash_create("RelfilenodeMap cache", 1024, &ctl,
+		hash_create("RelfilenodeMap cache", 64, &ctl,
 					HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/* Watch for invalidation events. */
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 347fdde..6714a30 100644
*** a/src/backend/utils/cache/inval.c
--- b/src/backend/utils/cache/inval.c
*************** static int	maxSharedInvalidMessagesArray
*** 178,195 ****
  /*
   * Dynamically-registered callback functions.  Current implementation
   * assumes there won't be very many of these at once; could improve if needed.
   */
  
! #define MAX_SYSCACHE_CALLBACKS 32
  #define MAX_RELCACHE_CALLBACKS 10
  
  static struct SYSCACHECALLBACK
  {
  	int16		id;				/* cache number */
  	SyscacheCallbackFunction function;
  	Datum		arg;
  }	syscache_callback_list[MAX_SYSCACHE_CALLBACKS];
  
  static int	syscache_callback_count = 0;
  
  static struct RELCACHECALLBACK
--- 178,202 ----
  /*
   * Dynamically-registered callback functions.  Current implementation
   * assumes there won't be very many of these at once; could improve if needed.
+  *
+  * To avoid searching in CallSyscacheCallbacks, all callbacks for a given
+  * syscache are linked into a list pointed to by syscache_callback_links[id].
+  * The link values are syscache_callback_list[] index plus 1, or 0 for none.
   */
  
! #define MAX_SYSCACHE_CALLBACKS 64
  #define MAX_RELCACHE_CALLBACKS 10
  
  static struct SYSCACHECALLBACK
  {
  	int16		id;				/* cache number */
+ 	int16		link;			/* next callback index+1 for same cache */
  	SyscacheCallbackFunction function;
  	Datum		arg;
  }	syscache_callback_list[MAX_SYSCACHE_CALLBACKS];
  
+ static int16 syscache_callback_links[SysCacheSize];
+ 
  static int	syscache_callback_count = 0;
  
  static struct RELCACHECALLBACK
*************** CacheRegisterSyscacheCallback(int cachei
*** 1386,1392 ****
--- 1393,1417 ----
  	if (syscache_callback_count >= MAX_SYSCACHE_CALLBACKS)
  		elog(FATAL, "out of syscache_callback_list slots");
  
+ 	Assert(cacheid >= 0 && cacheid < SysCacheSize);
+ 
+ 	if (syscache_callback_links[cacheid] == 0)
+ 	{
+ 		/* first callback for this cache */
+ 		syscache_callback_links[cacheid] = syscache_callback_count + 1;
+ 	}
+ 	else
+ 	{
+ 		/* add to end of chain, so that older callbacks are called first */
+ 		int			i = syscache_callback_links[cacheid] - 1;
+ 
+ 		while (syscache_callback_list[i].link > 0)
+ 			i = syscache_callback_list[i].link - 1;
+ 		syscache_callback_list[i].link = syscache_callback_count + 1;
+ 	}
+ 
  	syscache_callback_list[syscache_callback_count].id = cacheid;
+ 	syscache_callback_list[syscache_callback_count].link = 0;
  	syscache_callback_list[syscache_callback_count].function = func;
  	syscache_callback_list[syscache_callback_count].arg = arg;
  
*************** CallSyscacheCallbacks(int cacheid, uint3
*** 1426,1436 ****
  {
  	int			i;
  
! 	for (i = 0; i < syscache_callback_count; i++)
  	{
  		struct SYSCACHECALLBACK *ccitem = syscache_callback_list + i;
  
! 		if (ccitem->id == cacheid)
! 			(*ccitem->function) (ccitem->arg, cacheid, hashvalue);
  	}
  }
--- 1451,1464 ----
  {
  	int			i;
  
! 	Assert(cacheid >= 0 && cacheid < SysCacheSize);
! 	i = syscache_callback_links[cacheid] - 1;
! 	while (i >= 0)
  	{
  		struct SYSCACHECALLBACK *ccitem = syscache_callback_list + i;
  
! 		Assert(ccitem->id == cacheid);
! 		(*ccitem->function) (ccitem->arg, cacheid, hashvalue);
! 		i = ccitem->link - 1;
  	}
  }
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 066ce72..f0a16e3 100644
*** a/src/backend/utils/cache/syscache.c
--- b/src/backend/utils/cache/syscache.c
*************** static const struct cachedesc cacheinfo[
*** 971,978 ****
  	}
  };
  
- #define SysCacheSize	((int) lengthof(cacheinfo))
- 
  static CatCache *SysCache[SysCacheSize];
  
  static bool CacheInitialized = false;
--- 971,976 ----
*************** InitCatalogCache(void)
*** 1003,1008 ****
--- 1001,1009 ----
  	int			i,
  				j;
  
+ 	StaticAssertStmt(SysCacheSize == (int) lengthof(cacheinfo),
+ 					 "SysCacheSize does not match syscache.c's array");
+ 
  	Assert(!CacheInitialized);
  
  	SysCacheRelationOidSize = SysCacheSupportingRelOidSize = 0;
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 73991dd..e20284d 100644
*** a/src/include/utils/syscache.h
--- b/src/include/utils/syscache.h
*************** enum SysCacheIdentifier
*** 108,113 ****
--- 108,115 ----
  	TYPEOID,
  	USERMAPPINGOID,
  	USERMAPPINGUSERSERVER
+ 
+ #define SysCacheSize (USERMAPPINGUSERSERVER + 1)
  };
  
  extern void InitCatalogCache(void);
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general