Re: SubtransControlLock and performance problems

Lars Aksel Opsahl <Lars.Opsahl@xxxxxxxx> · Wed, 19 Feb 2020 10:49:14 +0000

Hi

>From: 
Laurenz Albe <laurenz.albe@cybertec.at>
>Sent: Tuesday, February 18, 2020 6:27 PM
>ATo: 
Pavel Stehule <pavel.stehule@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>
>Cc:
Lars 
Aksel Opsahl <Lars.Opsahl@nibio.no>;
pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
>Subject: Re: SubtransControlLock and performance problems
>
>Did you have several concurrent sessions accessing the rows that others created?

Hi

Thanks every body, I have done more testing here..

- I was not able fix this problem by increasing this values

src/include/access/subtrans.h, define NUM_SUBTRANS_BUFFERS 8196

src/include/storage/proc.h , PGPROC_MAX_CACHED_SUBXIDS 128

If tried to increase PGPROC_MAX_CACHED_SUBXIDS more than 128 Postgres core dumped. I tried to increase shared memory and other settings but I was not able to get it statble.

With the values above I did see same performance problems and we ended with a lot of subtransControlLock.

So I started to change the code based on your feedbacks.

- What seems to work very good in combination with a catch exception and retry pattern is to insert the data in to separate table for each job. (I the current testcase we reduced the number of subtransControlLock from many hundreds to almost none.)

Then I later can pick up these results from different the tables with another job that inserts data in to common data structure and in this job I don’t have any catch retry pattern. Then I was able to handle 534 of 592 jobs/cells with out any subtransControlLock
 at all.

But 58 jobs did not finish so for these I had to use a catch retry pattern and then then I got the subtransControlLock problems, but thats for a limited sets of the data.

Between each job I also close open the connections I dblink.

In this test I used dataset with data set 619230 surface with total of 25909671 and it did finish in 24:42.363, with NUM_SUBTRANS_BUFFERS 8196 and PGPROC_MAX_CACHED_SUBXIDS 128. When I changed this back to the original values the same test took 23:54.973.

For me it’s seems like in Postgres it’s better to have functions that returns an error state together with the result and not throws an exceptions, because exceptions leads performance degeneration when working with big datasets.

Thanks

Lars