Re: Subselect query enhancement

Richard Huxton <dev@xxxxxxxxxxxx> · Thu, 01 Feb 2007 19:02:40 +0000

Michael Artz wrote:
 > I have primary table that holds ip information
> and two other tables that hold event data for the specific IP in with
> a one-to-many mapping between them, ie:
[snip]
> There is quite a bit of commonality between the network_events and
> host_events schemas, but they do not currently share an ancestor.
> ip_info has about 13 million rows, the network_events table has about
> 30 million rows, and the host_events table has about 7 million rows.
> There are indexes on all the rows.

What indexes though. Do you have (name,ip) on the two event tables?

All the columns are indexed individually.  The tables are completely
static, as I reload the whole DB with new data every day.

The point of a (name,ip) index would be to let you read off ip numbers 
in order easily.

How selective is "name" - are there many different values or just a few?
If lots, it might be worth increasing the statistics gathered on that
column (ALTER COLUMN ... SET STATISTICS).
http://www.postgresql.org/docs/8.2/static/sql-altertable.html

I guess that is the heart of my question.  "name" is not very
selective (there are only 20 or so choices) however other columns are
fairly selective for certain cases, such as 'port'.  When querying on
and unusual port, the query is very fast, and the single UNIONed
subselect returns quickly.  When 'port' is not very selective (like
port = '80', which is roughly 1/2 of the rows in the DB), the dual
subselect query wins, hands-down.

And I have altered the statistics via the config file:
 default_statistics_target = 100
Perhaps this should be even higher for certain columns?

You're probably better off leaving it at 10 and upping it for the vital 
columns. 25 for names should be a good choice.

You could try partial indexes for those cases where you have 
particularly common values of name/port:

CREATE INDEX idx1 ON host_events (ip) WHERE port=80;

--
  Richard Huxton
  Archonet Ltd