Re: same plan, add 1 condition, 1900x slower

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote:
> Apparently, you are using a platform and/or locale in which strcoll() is
> spectacularly, god-awfully slow --- on the order of 10 msec per comparison.

The version with the condition is definitely doing more I/O.  The
version without the condition doesn't read at all.  I strace'd an
explain analyze for each separately, and this is what I ended up with
(the first is with the condition, the second is without):

bash-2.05b$ cut '-d(' -f1 subsourcestrace | sort | uniq -c
   7127 gettimeofday
  75213 _llseek
      1 Process 30227 attached - interrupt to quit
      1 Process 30227 detached
 148671 read
      2 recv
      4 semop
      4 send
bash-2.05b$ cut '-d(' -f1 subsourcestrace-nocond | sort | uniq -c
   9103 gettimeofday
      7 _llseek
      1 Process 30227 attached - interrupt to quit
      1 Process 30227 detached
      2 recv
      4 send

For the moment, all of the rows in the view I'm selecting from satisfy
the condition, so the output of both queries is the same.  The relevant
rows of the underlying tables are probably pretty contiguous (all of the
rows satisfying the condition and the join were inserted at the same
time).  Could it just be the result of a weird physical distribution of
data in the table/index files?  For the fast query, the actual number of
rows is a lot less than the planner expects.

> This is a bit hard to believe but I can't make sense of those numbers
> any other way.  What is the platform exactly, and what database locale
> and encoding are you using?

It's RHEL 3 on x86:
[root@rehoboam root]# uname -a
Linux rehoboam 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005
i686 i686 i386 GNU/Linux

The glibc version is 2.3.2.

statgen=# select current_setting('lc_collate');
 current_setting
-----------------
 en_US.UTF-8

Not sure what's relevant, but here's some more info:
The machine has 4.5GiB of RAM and a 5-disk Raid 5.  It's a dual xeon
3.2ghz.

           relname           | relpages |  reltuples
-----------------------------+----------+-------------
 external_id_map             |   126883 | 1.55625e+07
 external_id_map_primary_key |    64607 | 1.55625e+07
 subject                     |       31 |        1186
 subject_pkey                |       19 |        1186

I've attached the output of "select name, setting from pg_settings".

And, in case my original message isn't handy, the explain analyze output
and table/view info is below.

Thanks for taking a look,
Mitch

statgen=> explain analyze select * from subject_source;

QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..330.72 rows=1186 width=46) (actual
time=0.051..8.890 rows=1186 loops=1)
   Merge Cond: ("outer".id = "inner".target_id)
   ->  Index Scan using subject_pkey on subject norm  (cost=0.00..63.36
rows=1186 width=28) (actual time=0.022..1.441 rows=1186 loops=1)
   ->  Index Scan using external_id_map_primary_key on external_id_map
eim  (cost=0.00..2485226.70 rows=15562513 width=26) (actual
time=0.016..2.532 rows=2175 loops=1)
 Total runtime: 9.592 ms
(5 rows)

statgen=> explain analyze select * from subject_source where
source='SCH';

QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..1147.33 rows=1 width=46) (actual
time=0.054..20258.161 rows=1186 loops=1)
   Merge Cond: ("outer".id = "inner".target_id)
   ->  Index Scan using subject_pkey on subject norm  (cost=0.00..63.36
rows=1186 width=28) (actual time=0.022..1.478 rows=1186 loops=1)
   ->  Index Scan using external_id_map_primary_key on external_id_map
eim  (cost=0.00..2524132.99 rows=2335 width=26) (actual
time=0.022..20252.326 rows=1186 loops=1)
         Filter: (source = 'SCH'::bpchar)
 Total runtime: 20258.922 ms
(6 rows)

statgen=> \d subject_source
         View "public.subject_source"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 id        | bigint                |
 sex       | integer               |
 parent1   | bigint                |
 parent2   | bigint                |
 source    | character(3)          |
 source_id | character varying(32) |
View definition:
 SELECT norm.id, norm.sex, norm.parent1, norm.parent2, eim.source,
eim.source_id
   FROM subject norm
   JOIN util.external_id_map eim ON norm.id = eim.target_id;

statgen=> \d subject
    Table "public.subject"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id      | bigint  | not null
 sex     | integer |
 parent1 | bigint  |
 parent2 | bigint  |
Indexes:
    "subject_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "subject_parent1" FOREIGN KEY (parent1) REFERENCES subject(id)
DEFERRABLE INITIALLY DEFERRED
    "subject_parent2" FOREIGN KEY (parent2) REFERENCES subject(id)
DEFERRABLE INITIALLY DEFERRED
    "subject_id_map" FOREIGN KEY (id) REFERENCES
util.external_id_map(target_id) DEFERRABLE INITIALLY DEFERRED

statgen=> \d util.external_id_map
         Table "util.external_id_map"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 source_id | character varying(32) | not null
 source    | character(3)          | not null
 target_id | bigint                | not null
Indexes:
    "external_id_map_primary_key" PRIMARY KEY, btree (target_id)
    "external_id_map_source_source_id_unique" UNIQUE, btree (source,
source_id)
    "external_id_map_source" btree (source)
    "external_id_map_source_target_id" btree (source, target_id)
Foreign-key constraints:
    "external_id_map_source" FOREIGN KEY (source) REFERENCES
util.source(id)


              name              |    setting     
--------------------------------+----------------
 add_missing_from               | on
 archive_command                | unset
 australian_timezones           | off
 authentication_timeout         | 60
 bgwriter_delay                 | 200
 bgwriter_maxpages              | 100
 bgwriter_percent               | 1
 block_size                     | 8192
 check_function_bodies          | on
 checkpoint_segments            | 32
 checkpoint_timeout             | 300
 checkpoint_warning             | 60
 client_encoding                | SQL_ASCII
 client_min_messages            | notice
 commit_delay                   | 0
 commit_siblings                | 5
 cpu_index_tuple_cost           | 0.001
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 custom_variable_classes        | unset
 DateStyle                      | ISO, MDY
 db_user_namespace              | off
 deadlock_timeout               | 1000
 debug_pretty_print             | off
 debug_print_parse              | off
 debug_print_plan               | off
 debug_print_rewritten          | off
 debug_shared_buffers           | 0
 default_statistics_target      | 10
 default_tablespace             | unset
 default_transaction_isolation  | read committed
 default_transaction_read_only  | off
 default_with_oids              | off
 effective_cache_size           | 65536
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexscan               | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
 explain_pretty_print           | on
 extra_float_digits             | 0
 from_collapse_limit            | 8
 fsync                          | on
 geqo                           | on
 geqo_effort                    | 5
 geqo_generations               | 0
 geqo_pool_size                 | 0
 geqo_selection_bias            | 2
 geqo_threshold                 | 12
 integer_datetimes              | off
 join_collapse_limit            | 8
 lc_collate                     | en_US.UTF-8
 lc_ctype                       | en_US.UTF-8
 lc_messages                    | en_US.UTF-8
 lc_monetary                    | en_US.UTF-8
 lc_numeric                     | en_US.UTF-8
 lc_time                        | en_US.UTF-8
 listen_addresses               | *
 log_connections                | off
 log_destination                | stderr
 log_disconnections             | off
 log_duration                   | off
 log_error_verbosity            | default
 log_executor_stats             | off
 log_hostname                   | off
 log_line_prefix                | unset
 log_min_duration_statement     | -1
 log_min_error_statement        | panic
 log_min_messages               | notice
 log_parser_stats               | off
 log_planner_stats              | off
 log_rotation_age               | 1440
 log_rotation_size              | 0
 log_statement                  | none
 log_statement_stats            | off
 log_truncate_on_rotation       | on
 maintenance_work_mem           | 65536
 max_connections                | 100
 max_files_per_process          | 1000
 max_fsm_pages                  | 20000
 max_fsm_relations              | 1000
 max_function_args              | 32
 max_identifier_length          | 63
 max_index_keys                 | 32
 max_locks_per_transaction      | 64
 max_stack_depth                | 2048
 password_encryption            | on
 port                           | 5432
 pre_auth_delay                 | 0
 random_page_cost               | 4
 redirect_stderr                | on
 regex_flavor                   | advanced
 rendezvous_name                | unset
 search_path                    | $user,public
 server_encoding                | SQL_ASCII
 server_version                 | 8.0.3
 shared_buffers                 | 32768
 silent_mode                    | off
 sql_inheritance                | on
 ssl                            | off
 statement_timeout              | 0
 stats_block_level              | off
 stats_command_string           | on
 stats_reset_on_server_start    | on
 stats_row_level                | off
 stats_start_collector          | on
 superuser_reserved_connections | 2
 syslog_facility                | LOCAL0
 syslog_ident                   | postgres
 TimeZone                       | PST8PDT
 trace_notify                   | off
 transaction_isolation          | read committed
 transaction_read_only          | off
 transform_null_equals          | off
 unix_socket_group              | unset
 unix_socket_permissions        | 511
 vacuum_cost_delay              | 0
 vacuum_cost_limit              | 200
 vacuum_cost_page_dirty         | 20
 vacuum_cost_page_hit           | 1
 vacuum_cost_page_miss          | 10
 wal_buffers                    | 8
 wal_sync_method                | fdatasync
 work_mem                       | 65536
 zero_damaged_pages             | off
(128 rows)

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux