Hi Thomas,
Thank you for your reply.
As you mentioned in question-8, "I'd investigate whether data is being cached unexpectedly, perhaps indicating that committed transactions be lost in a system crash event." So, I would like to know that if we configure the disk for the WALs with read+write disk cache then will it create any performance issue and show the attached output?
I also would like to know is there any best Practice from PostgreSQL which mentions what is the disk latency required for the WAL & DATA disk?
On Fri, 10 Dec 2021 at 10:56, Thomas Munro <thomas.munro@xxxxxxxxx> wrote:
On Fri, Dec 10, 2021 at 3:20 PM PGSQL DBA <pgsqldba.1987@xxxxxxxxx> wrote:
> 1) How to interpret the output of pg_test_fsync?
The main interesting area is probably the top section that compares
the different wal_sync_method settings. For example, it's useful to
verify the claim that fdatasync() is faster than fsync() (because it
only flushes data, not meta-data like file modified time). It may
also be useful for measuring the effects of different caching settings
on your OS and storage. Unfortunately open_datasync is a bit
misleading; we don't actually use O_DIRECT with open_datasync anymore,
unless you set wal_level=minimal, which almost nobody ever does.
> 2) What is the meaning of ops/sec & usecs/op?
Number of times it managed to flush data to disk per second
sequentially, and the same information expressed as microseconds per
flush.
> 3) How does this utility work internally?
It just does a loop over some system calls, or to be more precise,
https://github.com/postgres/postgres/blob/master/src/bin/pg_test_fsync/pg_test_fsync.c
> 4) What is the IO pattern of this utility? serial/sequence IO or Multiple thread with Parallel IO?
Sequential, no threads.
> 5) Can we change the testing like FIO with multiple threads and parallel IO?
Nope. This is a simple tool. Fio is much more general and useful.
> 6) How a commit happened in the background while executing this utility?
Nothing happens in the background, it uses synchronous system calls
from one thread.
> 7) How can we use this tool to measure the I/O issue?
It's a type of micro-benchmark that gives you an idea of a sort of
baseline you can expect from a single PostgreSQL session committing to
the WAL.
> 8) In which area or section in the output do we need to focus while troubleshooting I/O issues?
If PostgreSQL couldn't commit small sequential transactions about that
fast I'd be interested in finding out why, and if fdatasync is
performing faster than published/device IOPS suggest should be
possible then I'd investigate whether data is being cached
unexpectedly, perhaps indicating that committed transactions be lost
in a system crash event.
> 9) What is the meaning of “Non-sync’ed 8kB writes?
Calling the pwrite() system call, which writes into your operating
system's page cache but (usually) doesn't wait for any I/O. Should be
somewhere north of 1 million/sec.
Attachment:
pg_test_fsync.png
Description: PNG image