On Wed, Jul 12, 2023 at 1:11 AM Dimitrios Apostolou <jimis@xxxxxxx> wrote: > Note that I suspect my setup being related, (btrfs compression behaving > suboptimally) since the raw device can give me up to 1GB/s rate. It is however > evident that reading in bigger chunks would mitigate such setup inefficiencies. > On a system that reads are already optimal and the read rate remains the same, > then bigger block size would probably reduce the sys time postgresql consumes > because of the fewer system calls. I don't know about btrfs but maybe it can be tuned to prefetch sequential reads better... > So would it make sense for postgres to perform reads in bigger blocks? Is it > easy-ish to implement (where would one look for that)? Or must the I/O unit be > tied to postgres' page size? It is hard to implement. But people are working on it. One of the problems is that the 8KB blocks that we want to read data into aren't necessarily contiguous so you can't just do bigger pread() calls without solving a lot more problems first. The project at https://wiki.postgresql.org/wiki/AIO aims to deal with the "clustering" you seek plus the "gathering" required for non-contiguous buffers by allowing multiple block-sized reads to be prepared and collected on a pending list up to some size that triggers merging and submission to the operating system at a sensible rate, so we can build something like a single large preadv() call. In the current prototype, if io_method=worker then that becomes a literal preadv() call running in a background "io worker" process, but it could also be OS-specific stuff (io_uring, ...) that starts an asynchronous IO depending on settings. If you take that branch and run your test you should see 128KB-sized preadv() calls.