Re: Fastest way to clone schema ~1000x

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Mon, 26 Feb 2024 08:14:20 +0100

po 26. 2. 2024 v 8:08 odesílatel Emiel Mols <emiel@xxxxxxxx> napsal:
Thanks, as indicated we're using that right now. The 30% spinlock overhead unfortunately persists.

try to increase shared_buffer

128MB can be too low

max_connection = 2048 - it unrealistic high

- Fsync was already disabled, too. Complete postgresql.conf used in testing:listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'

- linux perf report comparing schema-per-test vs database-per-test: https://ibb.co/CW5w2MW

- Emiel

On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote:
Hi

po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@xxxxxxxx> napsal:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.
Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

For testing

a) use templates - CREATE DATABASE test TEMPLATE some;

b) disable fsync (only for testing!!!)

Regards

Pavel 

Best,

Emiel