Re: Question about the IO on full table scan operation

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Wed, 06 Oct 2010 15:27:55 -0500

hsienwen chu <chu.hsien.wen@xxxxxxxxx> wrote:

> the situation is that: I have a table, and the size is about 10M,
> I run the database on Redhat Linux platform. for OS, a IO can hand
> 1M data, the file system block is 4k, database block is 8k,
> 
> my question is that: when do the full table scan on the table, how
> many IO will be done? 10M/1M=10 ? or 10M/8k=1280? or 10M/4k=2560?

Well, it will be looking at one 8kB block at a time, but the number
of I/O operations to the actual disk is hard to estimate and may
vary from run to run, especially if another table scan is going on
at the same time.

Some pages may be found in shared buffers, and therefore generate no
disk access.  PostgreSQL reads go through the OS cache, so some
pages may be found there, and generate no disk access.  Most serious
database servers have a RAID controller with a cache, which may also
avoid disk access.  You might even read from a write-through cache
on a drive without actually going to the disk.

On top of that, many operating systems will notice a pattern of
sequential reads through a file and do larger block read-ahead
access for performance.  And PostgreSQL has its own tricks -- if you
start a sequential table scan while another process is already doing
one, your scan will start where the other one is, to avoid
double-reads.

About all I can say with any certainty is that it *won't* be doing
10M/4k=2560 reads.  And even there I won't be too surprised if
someone corrects me with an explanation of then *that* might happen.
 ;-)

-Kevin

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin