PGOPTIONS="-c synchronous_commit=off" pgbench -T 3600 -P 10 ....I am currently running all my benchmarks with synchronous_commit=off and will get back with my findings.
It seems that PGOPTIONS="-c synchronous_commit=off" has a significant impact. However, I still can not understand why the TPS for the optimised case is LOWER than the default for higher concurrency levels!
+--------+---------------------+------------------------+
| client | Mostly defaults [1] | Optimised settings [2] |
+--------+---------------------+------------------------+
| 1 | 80-86 | 169-180 |
+--------+---------------------+------------------------+
| 6 | 350-376 | 1265-1397 |
+--------+---------------------+------------------------+
| 12 | 603-619 | 1746-2352 |
+--------+---------------------+------------------------+
| 24 | 947-1015 | 1869-2518 |
+--------+---------------------+------------------------+
| 48 | 1435-1512 | 1912-2818 |
+--------+---------------------+------------------------+
| 96 | 1769-1811 | 1546-1753 |
+--------+---------------------+------------------------+
| 192 | 1857-1992 | 1332-1508 |
+--------+---------------------+------------------------+
| 384 | 1667-1793 | 1356-1450 |
+--------+---------------------+------------------------+
[1] "Mostly default" settings are whatever ships with Ubuntu 18.04 + PG 11. A snippet of the relevant setts are given below:
max_connection=400
work_mem=4MB
maintenance_work_mem=64MB
shared_buffers=128MB
temp_buffers=8MB
effective_cache_size=4GB
wal_buffers=-1
wal_sync_method=fsync
max_wal_size=1GB
autovacuum=off # Auto-vacuuming was disabled
[2] An optimised version of settings was obtained from https://pgtune.leopard.in.ua/#/ and along with that the benchmarks were run with PGOPTIONS="-c synchronous_commit=off"
max_connections = 400
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 3495kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 12
max_parallel_workers_per_gather = 6
max_parallel_workers = 12
autovacuum=off # Auto-vacuuming was disabled