Re: Logical decoding CPU-bound w/ large number of tables

Andres Freund <andres@xxxxxxxxxxx> · Fri, 5 May 2017 18:20:55 -0700

Hi,

On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote:
> The stalls occur unpredictably on my production system, but generally seem
> to be correlated with schema operations.  My source database has about
> 100,000 tables; it's a one-schema-per-tenant multi-tenant SaaS system.

I'm unfortunately not entirely surprised you're seeing some issues in
that case.  We're invalidating internal caches a bit bit
overjudiciously, and that invalidation is triggered by schema changes.

> I've performed a CPU sampling with the OSX `sample` tool based upon
> reproduction approach #1:
> https://gist.github.com/mfenniak/366d7ed19b2d804f41180572dc1600d8  It
> appears that most of the time is spent in the
> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache
> invalidation callbacks, both of which appear to be invalidating caches
> based upon the cache value.

I think optimizing those has some value (and I see Tom is looking at
that aspect, but the bigger thing would probably be to do fewer lookups.

> Has anyone else run into this kind of performance problem?  Any thoughts on
> how it might be resolved?  I don't mind putting in the work if someone
> could describe what is happening here, and have a discussion with me about
> what kind of changes might be necessary to improve the performance.

If you could provide an easily runnable sql script that reproduces the
issue, I'll have a look.  I think I have a rough idea what to do.

Greetings,

Andres Freund

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general