Search Postgresql Archives

Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2 Oct 2009, Simon Riggs wrote:


On Sun, 2009-09-27 at 18:05 +0200, Gerhard Wiesinger wrote:

So I saw, that even on sequential reads (and also on bitmap heap scan acces)
PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.

A commercial software database vendor solved the problem by reading multiple
continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5
seconds on an equivalent "sequence scan":

Is systemtap counting actual I/Os or just requests to access 8192 blocks
once in OS cache? Postgres doesn't read more than one block at a time
into its buffer pool, so those numbers of requests look about right.


As far as I know these are VFS reads. So some reads might be from cache but since I did all requests should be from disk:
echo 3 > /proc/sys/vm/drop_caches;service postgresql restart
do benchmark

Same for all benchmarks because I don't want to measure cache performance of OS or of the DB to benchmark.

Therefore all requests (except reaing twice or more but that shouldn't be the case and would also be fine as cache hit) are from disk and not from the cache.

There is belief here that multi-block I/O was introduced prior to OS
doing this as a standard mechanism. Linux expands its read ahead window
in response to sequential scans and so this seems like something we
don't want to do in the database.

I played even with large values on block device readaheads of /dev/md*, /dev/sd* and /dev/dm-* as well as stripe_cache_size of /dev/md* but without any performance improvements in the benmark scenarios.

=> All readaheads/caches don't seem to work in at least in the HEAP BITMAP SCAN scenarios on nearly latest Linux kernels.

But I think such block issues (reading in largest blocks as possible) have to be optimized on application level (in our case DB level) because
1.) We can't assume that OS and even storage works well in such scenarios
2.) We can't assume that OS/storage is intelligent enough to reduce number of IOPS when 2 random blocks are at random 2 sequential blocks and that therefore the number of IOPS is reduced. 3.) I think such a logic should be very easy to integrate and even has been done with some patches.


It's possible this is wrong. Is the table being scanned fairly sizable
and was it allocated contiguously? i.e. was it a large table loaded via
COPY?

I also wonder if more L2 cache effects exist.


What do you mean with "table being scanned fairly sizable"? I don't get it.

Table was filled with a lot of inserts but at one time point.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux