Re: Failure to use indexes (fwd)

Edmund Dengler <edmundd@xxxxxxxxxxxx> · Tue, 2 Aug 2005 16:45:00 -0400 (EDT)

Greetings all!

Given the quiet, I assume that there is no experience with index issues on
inherited tables? Just seeing if anybody may have any ideas or suggested
work arounds (I seem to have found one by constructing a query that does
all the joins between inherited tables explicitely - this causes the
indexes to be used - still trying to make sure it is a legitimate method).

Regards!
Ed

---------- Forwarded message ----------
Date: Fri, 29 Jul 2005 14:23:29 -0400 (EDT)
From: Edmund Dengler <edmundd@xxxxxxxxxxxx>
To: Postgresql-General <pgsql-general@xxxxxxxxxxxxxx>
Subject: Re:  Failure to use indexes

Greetings!

I managed to get an even simpler example:

select *
from eventlog.record_main
where luid in (
    select luid from eventlog_partition._20050706__raw_record
    order by luid limit 1
  )
;

If I use <=> rather than <in>, postgresql uses index scanning. As soon as
I use <in> (ie, a set to compare against), I get sequential scanning,
event though the set size is only a single element.

Regards!
Ed

On Fri, 29 Jul 2005, Edmund Dengler wrote:

> Greetings!
>
> I am using <inherits> to partition several tables. When I perform a query
> on another table, and then try to join against an inherited table set, the
> optimizer does not use any indexes to perform the join.
>
> This is 8.0.1. Is this fixed in 8.0.2 or 8.0.3?
>
> The query:
>
> explain
> select *
> from (
>     select *  from eventlog.record_classification as record_classification
>     where
>       time_written >= '2005-07-06 00:00:00+00'::timestamptz
>       and time_written < '2005-07-06 00:00:00+00'::timestamptz
>     order by time_written, luid
>     offset  0
>     limit  500
>   ) as classification
>   join eventlog.record_main as main using (luid, time_written)
> ;
>
> The explanation:
>
>                                                                                                 QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=51.15..20191003.89 rows=208027 width=178)
>    Hash Cond: (("outer".luid = "inner".luid) AND ("outer".time_written = "inner".time_written))
>    ->  Append  (cost=0.00..14641488.64 rows=554738383 width=96)
>          ->  Seq Scan on record_main main  (cost=0.00..0.00 rows=1 width=96)
>          ->  Seq Scan on _20050723__record_main main  (cost=0.00..94078.62 rows=3564462 width=96)
>          ->  Seq Scan on _20050724__record_main main  (cost=0.00..110075.12 rows=4170512 width=96)
>          ->  Seq Scan on _20050725__record_main main  (cost=0.00..122836.02 rows=4654002 width=96)
>          ->  Seq Scan on _20050726__record_main main  (cost=0.00..142347.71 rows=5393271 width=96)
>          ->  Seq Scan on _20050727__record_main main  (cost=0.00..130858.80 rows=4957980 width=96)
>          ....
>          (and so on, currently 123 such inheritd tables)
>          ....
>    ->  Hash  (cost=51.07..51.07 rows=15 width=98)
>          ->  Subquery Scan classification  (cost=50.89..51.07 rows=15 width=98)
>                ->  Limit  (cost=50.89..50.92 rows=15 width=98)
>                      ->  Sort  (cost=50.89..50.92 rows=15 width=98)
>                            Sort Key: record_classification.time_written, record_classification.luid
>                            ->  Result  (cost=0.00..50.59 rows=15 width=98)
>                                  ->  Append  (cost=0.00..50.59 rows=15 width=98)
>                                        ->  Seq Scan on record_classification  (cost=0.00..0.00 rows=1 width=98)
>                                              Filter: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
>                                        ->  Index Scan using _20050705__record_classification_time_written_idx on _20050705__record_classification record_classification  (cost=0.00..3.46 rows=1 width=54)
>                                              Index Cond: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
>                                        ->  Index Scan using _20050701__record_classification_time_written_idx on _20050701__record_classification record_classification  (cost=0.00..3.59 rows=1 width=54)
>                                              Index Cond: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
>                                        ->  Index Scan using _20050702__record_classification_time_written_idx on _20050702__record_classification record_classification  (cost=0.00..3.69 rows=1 width=54)
>                                              Index Cond: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
>                                        ->  Index Scan using _20050703__record_classification_time_written_idx on _20050703__record_classification record_classification  (cost=0.00..3.70 rows=1 width=54)
>                                              Index Cond: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
>                                        ....
>                                        (and so on)
>                                        ....
>                                        ->  Index Scan using _20050714__record_classification_time_written_idx on _20050714__record_classification record_classification  (cost=0.00..3.69 rows=1 width=53)
>                                              Index Cond: ((time_written >= '2005-07-06 00:00:00+00'::timestamp with time zone) AND (time_written < '2005-07-06 00:00:00+00'::timestamp with time zone))
> (164 rows)
>
> Sample tables:
>
> eventlog=# \d eventlog_partition._20050723__record_main
>       Table "eventlog_partition._20050723__record_main"
>         Column        |           Type           | Modifiers
> ----------------------+--------------------------+-----------
>  luid                 | bigint                   | not null
>  host_luid            | integer                  | not null
>  log_luid             | integer                  | not null
>  time_logged          | timestamp with time zone | not null
>  record_number        | bigint                   | not null
>  time_generated_epoch | bigint                   | not null
>  time_generated       | timestamp with time zone | not null
>  time_written_epoch   | bigint                   | not null
>  time_written         | timestamp with time zone | not null
>  event_id             | bigint                   | not null
>  event_type           | integer                  | not null
>  event_category       | integer                  | not null
>  source_luid          | integer                  | not null
>  computer_luid        | integer                  | not null
>  sid_luid             | integer                  |
>  message_luid         | integer                  |
> Indexes:
>     "_20050723__record_main_message_idx" UNIQUE, btree (message_luid, luid)
>     "_20050723__record_main_sid_idx" UNIQUE, btree (sid_luid, luid)
>     "_20050723__record_main_time_generated_idx" UNIQUE, btree (time_generated, luid)
>     "_20050723__record_main_time_logged_idx" UNIQUE, btree (time_logged, luid)
>     "_20050723__record_main_time_written_idx" UNIQUE, btree (time_written, luid)
>     "_20050723__record_main_pkey" btree (luid)
> Inherits: record_main
>
>
> eventlog=# \d eventlog.record_classification
>         Table "eventlog.record_classification"
>      Column     |           Type           | Modifiers
> ----------------+--------------------------+-----------
>  luid           | bigint                   | not null
>  class_luid     | integer                  | not null
>  time_written   | timestamp with time zone | not null
>  account_luid   | integer                  |
>  group_luid     | integer                  |
>  caller_luid    | integer                  |
>  source_machine | character varying(30)    |
>  source_ip      | character varying(30)    |
> Indexes:
>     "record_classification_pkey" PRIMARY KEY, btree (luid)
>     "record_classification_account_idx" btree (account_luid, time_written)
>     "record_classification_caller_idx" btree (caller_luid, time_written)
>     "record_classification_class_idx" btree (class_luid, time_written)
>     "record_classification_group_idx" btree (group_luid, time_written)
>     "record_classification_ip_idx" btree (source_ip, time_written)
>     "record_classification_machine_idx" btree (source_machine, time_written)
> Foreign-key constraints:
>     "record_classification_class_luid_fkey" FOREIGN KEY (class_luid) REFERENCES eventlog.classification(luid)
>     "record_classification_account_luid_fkey" FOREIGN KEY (account_luid) REFERENCES eventlog.account(luid)
>     "record_classification_group_luid_fkey" FOREIGN KEY (group_luid) REFERENCES eventlog.account(luid)
>     "record_classification_caller_luid_fkey" FOREIGN KEY (caller_luid) REFERENCES eventlog.account(luid)
>
>
> <luid> represents a unique key. I have tried the query using just <luid>
> as the join condition, same result.
>
> The system can use the <_XXXXXXXX__record_main_pkey> index on each
> <_XXXXXXXX__record_main> table to do an index scan on <luid>, but instead
> it chooses to do a sequential scan.  This is true whether
> <_XXXXXXXX__record_main_pkey> is specified as unique or not.
>
> I have "vacuum analyze" the entire database before running the queries. I
> have set the <default_statistics_target> in postgresql.conf to 100 to
> obtain more accurate statistics.
>
> If I specify a specific sub-table (ie,
> <eventlog_partition._XXXXXXXX__record_main>), in the join, the optimizer
> uses the indexes to speed search.
>
> Any ideas?
>
> Regards!
> Ed
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
>        message can get through to the mailing list cleanly
>

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster