Search Postgresql Archives

Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2 Oct 2009, Greg Smith wrote:

On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:

Larger blocksizes also reduce IOPS (I/Os per second) which might be a critial threshold on storage systems (e.g. Fibre Channel systems).

True to some extent, but don't forget that IOPS is always relative to a block size in the first place. If you're getting 200 IOPS with 8K blocks, increasing your block size to 128K will not result in your getting 200 IOPS at that larger size; the IOPS number at the larger block size is going to drop too. And you'll pay the penalty for that IOPS number dropping every time you're accessing something that would have only been an 8K bit of I/O before.


Yes, there will be some (very small) drop in IOPS, when blocksize is higher but today disks have a lot of throughput when IOPS*128k are compared to e.g. 100MB/s. I've done some Excel calculations which support this.

The trade-off is very application dependent. The position you're advocating, preferring larger blocks, only makes sense if your workload consists mainly of larger scans. Someone who is pulling scattered records from throughout a larger table will suffer with that same change, because they'll be reading a minimum of 128K even if all they really needed with a few bytes. That penalty ripples all the way from the disk I/O upwards through the buffer cache.


I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block I/O.

When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 requests with the largest possible blocksize:
1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

In this case when B5 and B7 are only one block away we could also discuss whether we should read B1-B10=10*8k=80k in one read request and don't use B6.

That would reduce the IOPS of a factor of 4-5 in that scenario and therefore throughput would go up.

It's easy to generate a synthetic benchmark workload that models some real-world applications and see performance plunge with a larger block size. There certainly are others where a larger block would work better. Testing either way is complicated by the way RAID devices usually have their own stripe sizes to consider on top of the database block size.


Yes, there are block device read ahead buffers and also RAID stripe caches. But both don't seem to work well with the tested HEAP BITMAP SCAN scenario and also in practical PostgreSQL performance measurement scenarios.

But the modelled pgiosim isn't a synthetic benchmark it is the same as a real work HEAP BITMAP SCAN scenario in PostgreSQL where some blocks are read directly consecutive at least logically in the filesystem (and with some propability also physically on disk) but currently only with each 8k block read even when 2 or more blocks could be read with one request.

BTW: I would also limit the blocksize to some upper limit on such requests (e.g. 1MB).

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux