Re: Block / Page Size Optimization

Andres Freund <andres@xxxxxxxxxxx> · Mon, 8 Apr 2019 09:28:46 -0700

Hi,

On 2019-04-08 11:09:07 -0400, Gunther wrote:
> I can set an XFS file system with 8192 bytes block size, but then it does
> not mount on Linux, because the VM page size is the limit, 4096 again.
> 
> There seems to be no way to change that in (most, common) Linux variants. In
> FreeBSD there appears to be a way to change that.
> 
> But then, there is a hardware limit also, as far as the VM memory page
> allocation is concerned. Apparently most i386 / amd64 architectures the VM
> page sizes are 4k, 2M, and 1G. The latter, I believe, are called "hugepages"
> and I only ever see that discussed in the PostgreSQL manuals for Linux, not
> for FreeBSD.
> 
> People have asked: does it matter? And then there is all that chatter about
> "why don't you run a benchmark and report back to us" -- "OK, will do" --
> and then it's crickets.
> 
> But why is this such a secret?
> 
> On Amazon AWS there is the following very simple situation: IO is capped on
> IO operations per second (IOPS). Let's say, on a smallish volume, I get 300
> IOPS (once my burst balance is used up.)
> 
> Now my simple theoretical reasoning is this: one IO call transfers 1 block
> of 4k size. That means, with a cap of 300 IOPS, I get to send 1.17 MB per
> second. That would be the absolute limit. BUT, if I could double the
> transfer size to 8k, I should be able to move 2.34 MB per second. Shouldn't
> I?

The kernel collapses consecutive write requests. You can see the
average sizes of IO requests using iostat -xm 1. When e.g. bulk loading
into postgres I see:

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              4.00  696.00      0.02    471.05     0.00    80.00   0.00  10.31    8.50    7.13   4.64     4.00   693.03   0.98  68.50

so the average write request size was 693.03 kb. Thus I got 470 MB/sec
despite there only being ~700 IOPS. That's with 4KB page sizes, 4KB FS
blocks, and 8KB postgres  block size.

There still might be some benefit of different FS block sizes, but it's
not going to be related directly to IOPS.

Greetings,

Andres Freund