Search Postgresql Archives

Re: IO related waits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/17/24 12:34, veem v wrote:

On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:


    Which means you need to on Flink end:

    1) Use Flink async I/O .

    2) Find a client that supports async or fake it by using multiple
    synchronous clients.

    On Postgres end there is this:

    https://www.postgresql.org/docs/current/wal-async-commit.html
    <https://www.postgresql.org/docs/current/wal-async-commit.html>

    That will return a success signal to the client quicker if
    synchronous_commit is set to off. Though the point of the Flink async
    I/O is not to wait for the response before moving on, so I am not sure
    how much synchronous_commit = off would help.


 Got it. So it means their suggestion was to set the asynch_io at flink level but not DB level, so that the application will not wait for the commit response from the database. But in that case , won't it overload the DB with more and more requests if database will keep doing the commit ( with synchronous_commit=ON)  and waiting for getting the response back from its storage for the WAL's to be flushed to the disk, while the application will not wait for its response back(for those inserts) and keep flooding the database with more and more incoming Insert requests?

My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is going to require rigorous testing over a representative data load.


See more below.


Additionally as I mentioned before, we see that from "pg_stat_database" from the column "xact_commit" , it's almost matching with the sum of "tup_inserted", "tup_updated", "tup_deleted" column. And also we verified in pg_stats_statements the  "calls" column is same as the "rows" column for the INSERT queries, so it means also we are inserting exactly same number of rows as the number of DB calls, so doesn't it suggest that we are doing row by row operations/dmls.

Also after seeing above and asking application team to do the batch commit ,we are still seeing the similar figures from pg_stat_database and pg_stat_statements, so does it mean that we are looking into wrong stats? or the application code change has not been done accurately? and we see even when no inserts are running from the application side, we do see "xact_commit" keep increasing along with "tup_fetched" , why so?

Finally we see in postgres here, even if we just write a DML statement it does commit that by default, until we explicitly put it in a "begin... end" block. Can that be the difference between how a "commit" gets handled in postgres vs other databases?

It does if autocommit is set in the client, that is common to other databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process to see if there are performance gains that can be obtained without adding an entirely new async component.




--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux