Re: Postgresql TPS Bottleneck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, thanks for your answer.

We have a Grafana instance monitoring all those metrics, no one I asked so far could identify an obvious bottleneck.
However, I have done further experiments to see if we are missing something.

While running the benchmark with our application I've run tools on the
DB node to smoke up the resources. These were cpuburn, iperf and fio.
While cpuburn did result in a small drop of Postgres TPS it was nothing
which was not expected. However, iperf and fio did not have any impact
at all (except iperf when more than our 10Gbps are sent - clearly). The
disks were utilized 100% but Postgres stayed at about 65k TPS.

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

I think what's also interesting is that our DB server has the TPS peak
when using about 80 clients (more results in the TPS going down again),
while when I search the internet most benchmarks peak at about 400-600
clients.

Does anyone have an idea what might be the problem?
Maybe I am missing a kernel/Postgres configuration parameter?

From: Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx>
Sent: Thursday, March 31, 2022 9:16 PM
To: wakandavision@xxxxxxxxxxx <wakandavision@xxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxxxxxxxx <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: Postgresql TPS Bottleneck
 


On 3/31/22 13:50, wakandavision@xxxxxxxxxxx wrote:
> Hi everyone,
>
> I am a bachelor's student and writing my thesis about the scaling and
> performance of an application. The application is using postgresql as a
> database but we can't scale any further currently as it seems postgres
> is hitting the limit.
>
> With the application, as well as with pgbench, we don't get more than
> (max) 70k TPS on postgres. But the servers' resources are not utilized
> completely (more below).
>
> I've tried many different configurations but none of them had any major
> performance impact (unless fsync and synchronous_commit = off).
>
> This is the (custom) configuration I am using:
>
> shared_buffers=65551953kB
> effective_cache_size=147491895kB
> huge_pages=on
> min_wal_size=20GB
> max_wal_size=200GB
> wal_buffers=1GB
> max_wal_senders=0
> archive_mode=off
> wal_level=minimal
> work_mem=2GB
> maintenance_work_mem=4GB
> checkpoint_completion_target=0.9
> checkpoint_timeout = 30min
> random_page_cost=1.1
> bgwriter_flush_after = 2MB
> effective_io_concurrency = 200
> # Disabled just for performance experiments
> fsync = off
> synchronous_commit = off
> full_page_writes = on
> max_worker_processes=64
> max_parallel_workers=64
> max_parallel_workers_per_gather=10
> max_parallel_maintenance_workers=12
>
> The system is as follows:
>
> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
> cores/CPU))
> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
> 2666 MHz (0.4 ns))
> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
> other for the data)
> * 10 Gbps network link
> * OS: Debian 11
> * Postgres 13 from apt
>
> (I've also written a stackoverflow post about it -
> https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> )
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
>       
> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack
> Overflow
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> We are testing our application for performance, which is using
> Postgresql 13 as a database. It is very insert and update heavy and we
> cannot get more than 65k TPS on the database. But none of the m...
> stackoverflow.com
>
>
>
> Below is just an example of the pgbench I ran:
>
> pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>
> pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>
>

I'd bet you need to use "pgbench -N" because the regular transaction
updates the "branch" table, and you only have 50 branches. Which
probably means a lot of conflicts and locking. The other thing you might
try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio
to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number
of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,
otherwise the numbers would go faster. Usually, it's either CPU bound,
in which case "perf top" might tell us more, or it's IO bound, in which
case try e.g. "iostat -x -k 1" or something.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux