Re: Postgresql TPS Bottleneck

<wakandavision@xxxxxxxxxxx> · Wed, 20 Apr 2022 06:49:16 +0000

Hi, thanks for your answer.

We have a Grafana instance monitoring all those metrics, no one I asked so far could identify an obvious bottleneck.

However, I have done further experiments to see if we are missing something.

While running the benchmark with our application I've run tools on the

DB node to smoke up the resources. These were cpuburn, iperf and fio.

While cpuburn did result in a small drop of Postgres TPS it was nothing

which was not expected. However, iperf and fio did not have any impact

at all (except iperf when more than our 10Gbps are sent - clearly). The

disks were utilized 100% but Postgres stayed at about 65k TPS.

The next thing I did was starting two independent Postgres instances on

the same server and run independent client applications against each of

them. This resulted in our application getting almost double of the TPS

compared to running a single instance (from 13k to 23k) - Each Postgres

instance had about 45k TPS which did not increase (?).

I think what's also interesting is that our DB server has the TPS peak

when using about 80 clients (more results in the TPS going down again),

while when I search the internet most benchmarks peak at about 400-600

clients.

Does anyone have an idea what might be the problem? 
Maybe I am missing a kernel/Postgres configuration parameter?

From: Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx>

Sent: Thursday, March 31, 2022 9:16 PM

To: wakandavision@xxxxxxxxxxx <wakandavision@xxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>

Subject: Re: Postgresql TPS Bottleneck

On 3/31/22 13:50, wakandavision@xxxxxxxxxxx wrote:

> Hi everyone,

> 

> I am a bachelor's student and writing my thesis about the scaling and

> performance of an application. The application is using postgresql as a

> database but we can't scale any further currently as it seems postgres

> is hitting the limit.

> 

> With the application, as well as with pgbench, we don't get more than

> (max) 70k TPS on postgres. But the servers' resources are not utilized

> completely (more below).

> 

> I've tried many different configurations but none of them had any major

> performance impact (unless fsync and synchronous_commit = off).

> 

> This is the (custom) configuration I am using:

> 

> shared_buffers=65551953kB

> effective_cache_size=147491895kB

> huge_pages=on

> min_wal_size=20GB

> max_wal_size=200GB

> wal_buffers=1GB

> max_wal_senders=0

> archive_mode=off

> wal_level=minimal

> work_mem=2GB

> maintenance_work_mem=4GB

> checkpoint_completion_target=0.9

> checkpoint_timeout = 30min

> random_page_cost=1.1

> bgwriter_flush_after = 2MB

> effective_io_concurrency = 200

> # Disabled just for performance experiments

> fsync = off

> synchronous_commit = off

> full_page_writes = on

> max_worker_processes=64

> max_parallel_workers=64

> max_parallel_workers_per_gather=10

> max_parallel_maintenance_workers=12

> 

> The system is as follows:

> 

> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16

> cores/CPU))

> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)

> 2666 MHz (0.4 ns))

> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the

> other for the data)

> * 10 Gbps network link

> * OS: Debian 11

> * Postgres 13 from apt

> 

> (I've also written a stackoverflow post about it -

> 
https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o

> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>

> )

> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>

>        

> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack

> Overflow

> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>

> We are testing our application for performance, which is using

> Postgresql 13 as a database. It is very insert and update heavy and we

> cannot get more than 65k TPS on the database. But none of the m...

> stackoverflow.com

> 

> 

> 

> Below is just an example of the pgbench I ran:

> 

> pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>

> pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>

> 

I'd bet you need to use "pgbench -N" because the regular transaction

updates the "branch" table, and you only have 50 branches. Which

probably means a lot of conflicts and locking. The other thing you might

try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio

to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number

of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,

otherwise the numbers would go faster. Usually, it's either CPU bound,

in which case "perf top" might tell us more, or it's IO bound, in which

case try e.g. "iostat -x -k 1" or something.

regards

-- 

Tomas Vondra

EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company