Performance hit on large row counts

David Scott <davids@xxxxxxxxxxxxxx> · Mon, 26 Dec 2005 13:03:34 -0800

We are trying to ascertain if we are up against the limits of what 
postgres can accomplish without having the tables clustered.  We would 
prefer not to have to cluster our tables because according to the 
documentation this is a one time operation and not maintained.  Is there 
any other performance tweaks that can be done to avoid clustering?  Is 
there any way to force the cluster maintenance (even with a performance 
hit on load)? 
We are aware that there is a minimum time that is required to resolve 
the index values against the table to ascertain that they are live rows, 
and we believe we are circumventing that time to some extent by taking 
advantage of the rows being in physical order with the cluster.  So does 
this lead us to the conclusion that the differences in the query times 
is how long is takes us to check on disk whether or not these rows are live?

Thanks for any help, thoughts, tips or suggestions.

All of these commands are after a vacuum full analyze and the config 
file is attached.  Different values were used for the queries so no 
caching would confuse our stats.  The box is running gentoo with 
postgres 8.1.0, has raid 0, 9 gigs of ram, 2 hyperthreaded procs, x86_64.

/Three tables with row counts:/
   lookup1.count =  3,306,930
   lookup2.count = 4,189,734
   stuff.count = 3,423,994

/The first attempt (after index adjustments, no hits to cached results)/

explain analyze select col2, count(*) as cnt from stuff where col1 = 
56984 group by col2

HashAggregate  (cost=14605.68..14605.88 rows=16 width=4) (actual 
time=6980.752..6985.893 rows=6389 loops=1)
  ->  Bitmap Heap Scan on stuff  (cost=60.97..14571.44 rows=6848 
width=4) (actual time=371.215..6965.742 rows=6389 loops=1)
        Recheck Cond: (col1 = 56984)
        ->  Bitmap Index Scan on stuff_pair_idx  (cost=0.00..60.97 
rows=6848 width=0) (actual time=361.237..361.237 rows=6389 loops=1)
              Index Cond: (col1 = 56984)
Total runtime: 6988.105 ms

/After clustering:/

explain analyze select col2, count(*) as cnt from stuff where col1 = 
3540634 group by col2;
                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=1399.62..1399.63 rows=1 width=4) (actual 
time=11.376..15.282 rows=5587 loops=1)
  ->  Bitmap Heap Scan on stuff  (cost=4.36..1397.68 rows=389 width=4) 
(actual time=1.029..4.538 rows=5587 loops=1)
        Recheck Cond: (col1 = 3540634)
        ->  Bitmap Index Scan on stuff_col1_idx  (cost=0.00..4.36 
rows=389 width=0) (actual time=1.003..1.003 rows=5587 loops=1)
              Index Cond: (col1 = 3540634)
Total runtime: 17.113 ms

/Using this in the next layer of querying:/

explain analyze  SELECT col1,col2, value AS val,  
coalesce(coalesce(lookup1.col3, lookup2.col3),0) AS dollars FROM (select 
col1, col2, value from stuff where col1 = 95350) stuff LEFT JOIN  
lookup1 ON (stuff.col2 = lookup1.pkey) LEFT JOIN lookup2 ON (stuff.col2 
= lookup2.pkey);

Nested Loop Left Join  (cost=0.00..10325.15 rows=857 width=20) (actual 
time=84.223..9306.228 rows=2296 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..5183.25 rows=857 width=16) 
(actual time=56.623..1710.655 rows=2296 loops=1)
        ->  Index Scan using stuff_col1_idx on stuff  (cost=0.00..21.57 
rows=857 width=12) (actual time=40.531..57.160 rows=2296 loops=1)
              Index Cond: (col1 = 4528383)
        ->  Index Scan using lookup2_pkey on lookup2  (cost=0.00..6.01 
rows=1 width=8) (actual time=0.717..0.717 rows=0 loops=2296)
              Index Cond: ("outer".col2 = lookup2.pkey)
  ->  Index Scan using lookup1_pkey on lookup1  (cost=0.00..5.99 rows=1 
width=8) (actual time=3.304..3.305 rows=1 loops=2296)
        Index Cond: ("outer".col2 = lookup1.pkey)
Total runtime: 9307.569 ms

/After clustering the two left join tables (lookup1 and lookup2):/

explain analyze  SELECT col1,col2, value AS val,  
coalesce(coalesce(lookup1.col3, lookup2.col3),0) AS dollars FROM (select 
col1, col2, value from stuff where col1 = 95350) stuff LEFT JOIN  
lookup1 ON (stuff.col2 = lookup1.pkey) LEFT JOIN lookup2 ON (stuff.col2 
= lookup2.pkey);

Nested Loop Left Join  (cost=0.00..10325.15 rows=857 width=20) (actual 
time=24.444..84.114 rows=1727 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..5163.47 rows=857 width=16) 
(actual time=24.392..62.787 rows=1727 loops=1)
        ->  Index Scan using stuff_col1_idx on stuff  (cost=0.00..21.57 
rows=857 width=12) (actual time=24.332..27.455 rows=1727 loops=1)
              Index Cond: (col1 = 95350)
        ->  Index Scan using lookup1_pkey on lookup1  (cost=0.00..5.99 
rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=1727)
              Index Cond: ("outer".col2 = lookup1.pkey)
  ->  Index Scan using lookup2_pkey on lookup2  (cost=0.00..6.01 rows=1 
width=8) (actual time=0.010..0.010 rows=0 loops=1727)
        Index Cond: ("outer".col2 = lookup2.pkey)
Total runtime: 84.860 ms

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have 
# to SIGHUP the postmaster for the changes to take effect, or use 
# "pg_ctl reload". Some settings, such as listen_address, require
# a postmaster shutdown and restart to take effect.

#---------------------------------------------------------------------------
# FILE LOCATIONS
#---------------------------------------------------------------------------

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'		# use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf'	# host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf	# IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
#external_pid_file = '(none)'		# write an extra pid file

#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

listen_addresses = '*'		# what IP address(es) to listen on; 
					# comma-separated list of addresses;
					# defaults to 'localhost', '*' = all
#port = 5432
max_connections = 100
# note: increasing max_connections costs ~400 bytes of shared memory per 
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
superuser_reserved_connections = 3
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777		# octal
#bonjour_name = ''			# defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60		# 1-600, in seconds
#ssl = off
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''
#krb_srvname = 'postgres'
#krb_server_hostname = ''		# empty string matches any keytab entry
#krb_caseins_users = off

# - TCP Keepalives -
# see 'man 7 tcp' for details

#tcp_keepalives_idle = 0		# TCP_KEEPIDLE, in seconds;
					# 0 selects the system default
#tcp_keepalives_interval = 0		# TCP_KEEPINTVL, in seconds;
					# 0 selects the system default
#tcp_keepalives_count = 0		# TCP_KEEPCNT;
					# 0 selects the system default

#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 16384			# min 16 or max_connections*2, 8KB each
temp_buffers = 10000			# min 100, 8KB each
#max_prepared_transactions = 5		# can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 2097152			# min 64, size in KB
maintenance_work_mem = 4194304		# min 1024, size in KB
#max_stack_depth = 2048			# min 100, size in KB

# - Free Space Map -

max_fsm_pages = 200000			# min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000		# min 100, ~70 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000		# min 25
#preload_libraries = ''

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0			# 0-1000 milliseconds
#vacuum_cost_page_hit = 1		# 0-10000 credits
#vacuum_cost_page_miss = 10		# 0-10000 credits
#vacuum_cost_page_dirty = 20		# 0-10000 credits
#vacuum_cost_limit = 200		# 0-10000 credits

# - Background writer -

#bgwriter_delay = 200			# 10-10000 milliseconds between rounds
#bgwriter_lru_percent = 1.0		# 0-100% of LRU buffers scanned/round
#bgwriter_lru_maxpages = 5		# 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333		# 0-100% of all buffers scanned/round
#bgwriter_all_maxpages = 5		# 0-1000 buffers max written/round

#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

fsync = off				# turns forced synchronization on or off
#wal_sync_method = fsync		# the default is the first option 
					# supported by the operating system:
					#   open_datasync
					#   fdatasync
					#   fsync
					#   fsync_writethrough
					#   open_sync
#full_page_writes = on			# recover from partial page writes
wal_buffers = 42			# min 4, 8KB each
#commit_delay = 0			# range 0-100000, in microseconds
#commit_siblings = 5			# range 1-1000

# - Checkpoints -

checkpoint_segments = 42		# in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300		# range 30-3600, in seconds
#checkpoint_warning = 30		# in seconds, 0 is off

# - Archiving -

#archive_command = ''			# command to use to archive a logfile 
					# segment

#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#effective_cache_size = 1000		# typically 8KB each
#random_page_cost = 4			# units are one sequential page fetch 
					# cost
#cpu_tuple_cost = 0.01			# (same)
#cpu_index_tuple_cost = 0.001		# (same)
#cpu_operator_cost = 0.0025		# (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5			# range 1-10
#geqo_pool_size = 0			# selects default based on effort
#geqo_generations = 0			# selects default based on effort
#geqo_selection_bias = 2.0		# range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10		# range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8		# 1 disables collapsing of explicit 
					# JOINs

#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'		# Valid values are combinations of 
					# stderr, syslog and eventlog, 
					# depending on platform.

# This is used when logging to stderr:
#redirect_stderr = off			# Enable capturing of stderr into log 
					# files

# These are only used if redirect_stderr is on:
#log_directory = 'pg_log'		# Directory where log files are written
					# Can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
					# Can include strftime() escapes
#log_truncate_on_rotation = off # If on, any existing log file of the same 
					# name as the new log file will be
					# truncated rather than appended to. But
					# such truncation only occurs on
					# time-driven rotation, not on restarts
					# or size-driven rotation. Default is
					# off, meaning append to existing files
					# in all cases.
#log_rotation_age = 1440		# Automatic rotation of logfiles will 
					# happen after so many minutes.  0 to 
					# disable.
#log_rotation_size = 10240		# Automatic rotation of logfiles will 
					# happen after so many kilobytes of log
					# output.  0 to disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice		# Values, in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   log
					#   notice
					#   warning
					#   error

#log_min_messages = notice		# Values, in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   info
					#   notice
					#   warning
					#   error
					#   log
					#   fatal
					#   panic

#log_error_verbosity = default		# terse, default, or verbose messages

#log_min_error_statement = panic	# Values in order of increasing severity:
				 	#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
				 	#   info
					#   notice
					#   warning
					#   error
					#   panic(off)

#log_min_duration_statement = -1	# -1 is disabled, 0 logs all statements
					# and their durations, in milliseconds.

#silent_mode = off			# DO NOT USE without syslog or 
					# redirect_stderr

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_line_prefix = ''			# Special values:
					#   %u = user name
					#   %d = database name
					#   %r = remote host and port
					#   %h = remote host
					#   %p = PID
					#   %t = timestamp (no milliseconds)
					#   %m = timestamp with milliseconds
					#   %i = command tag
					#   %c = session id
					#   %l = session line number
					#   %s = session start timestamp
					#   %x = transaction id
					#   %q = stop here in non-session 
					#        processes
					#   %% = '%'
					# e.g. '<%u%%%d> '
#log_statement = 'none'			# none, mod, ddl, all
#log_hostname = off

#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off

# - Query/Index Statistics Collector -

stats_start_collector = on
#stats_command_string = off
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off

#---------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#---------------------------------------------------------------------------

autovacuum = on			# enable autovacuum subprocess?
#autovacuum_naptime = 60		# time between autovacuum runs, in secs
#autovacuum_vacuum_threshold = 1000	# min # of tuple updates before
					# vacuum
#autovacuum_analyze_threshold = 500	# min # of tuple updates before 
					# analyze
#autovacuum_vacuum_scale_factor = 0.4	# fraction of rel size before 
					# vacuum
#autovacuum_analyze_scale_factor = 0.2	# fraction of rel size before 
					# analyze
#autovacuum_vacuum_cost_delay = -1	# default vacuum cost delay for 
					# autovac, -1 means use 
					# vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1	# default vacuum cost limit for 
					# autovac, -1 means use
					# vacuum_cost_limit

#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public'		# schema names
#default_tablespace = ''		# a tablespace name, '' uses
					# the default
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#statement_timeout = 0			# 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown			# actually, defaults to TZ 
					# environment setting
#australian_timezones = off
#extra_float_digits = 0			# min -15, max 2
#client_encoding = sql_ascii		# actually, defaults to database
					# encoding

# These settings are initialized by initdb -- they might be changed
lc_messages = 'C'			# locale for system error message 
					# strings
lc_monetary = 'C'			# locale for monetary formatting
lc_numeric = 'C'			# locale for number formatting
lc_time = 'C'				# locale for time formatting

# - Other Defaults -

#explain_pretty_print = on
#dynamic_library_path = '$libdir'

#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1000		# in milliseconds
#max_locks_per_transaction = 64		# min 10
# note: each lock table slot uses ~220 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.

#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = off
#regex_flavor = advanced		# advanced, extended, or basic
#sql_inheritance = on
#default_with_oids = off
#escape_string_warning = off

# - Other Platforms & Clients -

#transform_null_equals = off

#---------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#---------------------------------------------------------------------------

#custom_variable_classes = ''		# list of custom variable class names