Re: IO related waits

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 17 Sep 2024 08:54:49 -0700

On 9/16/24 20:55, veem v wrote:

On Tue, 17 Sept 2024 at 03:41, Adrian Klaver <adrian.klaver@xxxxxxxxxxx 
<mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    Are you referring to this?:

    https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/ <https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/>

    If not then you will need to be more specific.

Yes, I was referring to this one. So what can be the caveats in this 
approach, considering transactions meant to be ACID compliant as 
financial transactions.Additionally I was not aware of the parameter 
"synchronous_commit" in DB side which will mimic the synchronous commit.

Would both of these mimic the same asynchronous behaviour and achieves 
the same, which means the client data load throughput will increase 
because the DB will not wait for those data to be written to the WAL and 
give a confirmation back to the client and also the client will not wait 
for the DB to give a confirmation back on the data to be persisted in 
the DB or not?. Also, as in the backend the flushing of the WAL to the 
disk has to happen anyway(just that it will be delayed now), so can this 
method cause contention in the database storage side if the speed in 
which the data gets ingested from the client is not getting written to 
the disk , and if it can someway impact the data consistency for the 
read queries?

This is not something that I am that familiar with. I suspect though 
this is more complicated then you think. From the link above:

" Prerequisites #

As illustrated in the section above, implementing proper asynchronous 
I/O to a database (or key/value store) requires a client to that 
database that supports asynchronous requests. Many popular databases 
offer such a client.

In the absence of such a client, one can try and turn a synchronous 
client into a limited concurrent client by creating multiple clients and 
handling the synchronous calls with a thread pool. However, this 
approach is usually less efficient than a proper asynchronous client.
"

Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple 
synchronous clients.

On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html

That will return a success signal to the client quicker if 
synchronous_commit is set to off. Though the point of the Flink async 
I/O is not to wait for the response before moving on, so I am not sure 
how much synchronous_commit = off would help.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx