Re: IO related waits

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 17 Sep 2024 16:37:28 -0700

On 9/17/24 12:34, veem v wrote:

On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <adrian.klaver@xxxxxxxxxxx 
<mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    Which means you need to on Flink end:

    1) Use Flink async I/O .

    2) Find a client that supports async or fake it by using multiple
    synchronous clients.

    On Postgres end there is this:

    https://www.postgresql.org/docs/current/wal-async-commit.html
    <https://www.postgresql.org/docs/current/wal-async-commit.html>

    That will return a success signal to the client quicker if
    synchronous_commit is set to off. Though the point of the Flink async
    I/O is not to wait for the response before moving on, so I am not sure
    how much synchronous_commit = off would help.

  Got it. So it means their suggestion was to set the asynch_io at flink 
level but not DB level, so that the application will not wait for the 
commit response from the database. But in that case , won't it overload 
the DB with more and more requests if database will keep doing the 
commit ( with synchronous_commit=ON)  and waiting for getting the 
response back from its storage for the WAL's to be flushed to the disk, 
while the application will not wait for its response back(for those 
inserts) and keep flooding the database with more and more incoming 
Insert requests?

My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is 
going to require rigorous testing over a representative data load.

See more below.

Additionally as I mentioned before, we see that from "pg_stat_database" 
from the column "xact_commit" , it's almost matching with the sum of 
"tup_inserted", "tup_updated", "tup_deleted" column. And also we 
verified in pg_stats_statements the  "calls" column is same as the 
"rows" column for the INSERT queries, so it means also we are inserting 
exactly same number of rows as the number of DB calls, so doesn't it 
suggest that we are doing row by row operations/dmls.

Also after seeing above and asking application team to do the batch 
commit ,we are still seeing the similar figures from pg_stat_database 
and pg_stat_statements, so does it mean that we are looking into wrong 
stats? or the application code change has not been done accurately? and 
we see even when no inserts are running from the application side, we do 
see "xact_commit" keep increasing along with "tup_fetched" , why so?

Finally we see in postgres here, even if we just write a DML statement 
it does commit that by default, until we explicitly put it in a 
"begin... end" block. Can that be the difference between how a "commit" 
gets handled in postgres vs other databases?

It does if autocommit is set in the client, that is common to other 
databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are 
using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process 
to see if there are performance gains that can be obtained without 
adding an entirely new async component.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx