Re: huge SubtransSLRU and SubtransBuffer wait_event

Lars Aksel Opsahl <Lars.Opsahl@xxxxxxxx> · Fri, 2 Feb 2024 09:26:49 +0000

>From: James Pang (chaolpan) <chaolpan@xxxxxxxxx>Sent: Friday, February 2, 2024 7:47 AMTo:
Laurenz Albe <laurenz.albe@xxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>Subject: RE: huge SubtransSLRU and SubtransBuffer wait_event
>
>  Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update table
xxxx;commit);   2) application JDBC client call procedure1 (it's a long running job, sometimes it could last > 1hours).   During this time window,  other
Postgresql JDBC clients (100-200) coming in in same time , then quickly see MultiXactoffset and SubtransSLRU increased very quickly.

Hi

We had the same problem here https://gitlab.com/nibioopensource/resolve-overlap-and-gap . Here we can have more than 50 threads pushing
 millions of rows into common tables and one single final Postgis Topology structure as a final step. We also need to run try catch. The code is wrapped into functions and procedures and called from
psql .

Just to test we tried compile with a higher number of subtrans locks and that just made this problem appear just a little bit later.

For us the solution was to save temporary results in array like this

https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/commit/679bea2b4b1ba4c9e84923b65c62c32c3aed6c21#a22cbe80eb0e36ea21e4f8036e0a4109b2ff2379_611_617

. The clue is to do as much work as possible without involving any common data structures for instance like using arrays to hold temp results
 and not use a shared final table before it's really needed.

Then later at a final step we insert all prepared data into a final common data structure and where we also try to avoid try catch when
 possible. Then system can then run with verry high CPU load for 99% of the work and just at then verry end we start to involve the common database structure.

Another thing to avoid locks is let each thread work on it's down data as much possible, this means breaking up the input and sort what's
 unique data for this tread and postpone the common data to a later stage. When for instance working with Postgis Topology we actually split data to be sure that not two threads works on the same area and then at later state another thread push shared data/area
 in to the final data structure.

This steps seems to have solved this problem for us which started out here https://postgrespro.com/list/thread-id/2478202

Lars