We've experimented with shared buffers to no effect.
The 2048 we actually need for our test setup. The way this works is that we have a single preforked backend for all tests where each backend worker maintains persistent connections *per test* (in database-per-test), so with say 50 backend processes, they might each meantain ~10-50 connections to postgres. That's the reason for looking into schema-per-test that would require only 1 persistent connection per worker and possibly only changing the search_path to handle a request (and hencefort cheap initialization of schema.sql being a new impediment).
But based on your answer, we will do some more research into pooling these requests over pgbouncer with pool_mode=transaction. That should multiplex all these backend->bouncer connections over a lot less connections to postgres itself.
Thanks!
On Mon, Feb 26, 2024 at 2:14 PM Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote:
po 26. 2. 2024 v 8:08 odesílatel Emiel Mols <emiel@xxxxxxxx> napsal:Thanks, as indicated we're using that right now. The 30% spinlock overhead unfortunately persists.try to increase shared_buffer128MB can be too lowmax_connection = 2048 - it unrealistic high- Fsync was already disabled, too. Complete postgresql.conf used in testing:listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'- linux perf report comparing schema-per-test vs database-per-test: https://ibb.co/CW5w2MW- EmielOn Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote:Hipo 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@xxxxxxxx> napsal:Hello,To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.For testinga) use templates - CREATE DATABASE test TEMPLATE some;b) disable fsync (only for testing!!!)RegardsPavelBest,Emiel