Re: Air-traffic benchmark

"Gurgel, Flavio" <flavio@xxxxxxxxxxxxx> · Thu, 7 Jan 2010 13:45:16 -0200 (BRST)

----- "Lefteris" <lsidir@xxxxxxxxx> escreveu:
> > Did you ever try increasing shared_buffers to what was suggested
> (around
> > 4 GB) and see what happens (I didn't see it in your posts)?
> 
> No I did not to that yet, mainly because I need the admin of the
> machine to change the shmmax of the kernel and also because I have no
> multiple queries running. Does Seq scan uses shared_buffers?

Having multiple queries running is *not* the only reason you need lots of shared_buffers.
Think of shared_buffers as a page cache, data in PostgreSQL is organized in pages.
If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk.

> > help performance very much on multiple exequtions of the same
> query.

This is also true.
This kind of test should, and will, give different results in subsequent executions.

> > From the description of the data ("...from years 1988 to 2009...")
> it
> > looks like the query for "between 2000 and 2009" pulls out about
> half of
> > the data. If an index could be used instead of seqscan, it could be
> > perhaps only 50% faster, which is still not very comparable to
> others.

The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree have a huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too, so the gain has to be measured. 

I bet that an index scan will be a lot faster, but it's just a bet :)

> > The table is very wide, which is probably why the tested databases
> can
> > deal with it faster than PG. You could try and narrow the table
> down
> > (for instance: remove the Div* fields) to make the data more
> > "relational-like". In real life, speedups in this circumstances
> would
> > probably be gained by normalizing the data to make the basic table
> > smaller and easier to use with indexing.

Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to "jump" over columns using byte offsets.
A better option for this table is to partition it in year (or year/month) chunks.

45GB is not a so huge table compared to other ones I have seen before. I have systems where each partition is like 10 or 20GB and data is very fast to access even whith aggregation queries.

Flavio Henrique A. Gurgel
tel. 55-11-2125.4765
fax. 55-11-2125.4777
www.4linux.com.br
FREE SOFTWARE SOLUTIONS

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance