Hi all, following the simple but interesting air-traffic benchmark published at: http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/ I decided to run the benchmark over postgres to get some more experience and insights. Unfortunately, the query times I got from postgres were not the expected ones: the fast ones were in the order of 5 minutes while the slow ones in the order of 40 minutes. I came to the conclusion that some parameters are not set correctly. I give all the information I can think of in the sequel of this email, to hopefully get some useful suggestions to improve query times. Thank you very much for your time! All of the schema, data and query information can be found on the original blog post given above. The machine runs Fedora 10: uname -a: Linux 2.6.27.30-170.2.82.fc10.x86_64 #1 SMP Mon Aug 17 08:18:34 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux The hardware characteristics are: Platform Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz with 8GB RAM and ample disk space (2x 500 GB SATA disk @ 7200 RPM as SW-RAID-0) I used postgresql-8.4.2. Source download and compiled by hand, no special parameters where passed to the configure phase. -- DETAILS on loading, analyze and index creation on postgresql-8.4.2 -- loading time was 77m15.976s -- total rows: 119790558 -- total size of pgdata/base: 46G ANALYZE; -- 219698.365ms aka 3m39.698s CREATE INDEX year_month_idx ON ontime ("Year", "Month"); -- 517481.632ms aka 8m37.481s CREATE INDEX DepDelay_idx ON ontime ("DepDelay"); -- 1257051.668ms aka 20m57.051s Query 1: SELECT "DayOfWeek", count(*) AS c FROM ontime WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC; Reported query times are (in sec): MonetDB 7.9s InfoBright 12.13s LucidDB 54.8s For pg-8.4.2 I got with 3 consecutive runs on the server: 5m52.384s 5m55.885s 5m54.309s For more complex queries, postgres will become even more slower, that is for queries from Q2 to Q8 will need over 40 minutes, while on the rest ones around 5 minutes. By examining the explain analyze in these queries, the main slow down factor is the HashAggregate and mainly Sort operations. For query 1 the explain analyze returns: airtraffic=# EXPLAIN ANALYZE SELECT "DayOfWeek", count(*) AS c FROM ontime WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=7407754.12..7407754.13 rows=4 width=2) (actual time=371188.821..371188.823 rows=7 loops=1) Sort Key: (count(*)) Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=7407754.04..7407754.08 rows=4 width=2) (actual time=371163.316..371163.320 rows=7 loops=1) -> Seq Scan on ontime (cost=0.00..7143875.40 rows=52775727 width=2) (actual time=190938.959..346180.079 rows=52484047 loops=1) Filter: (("Year" >= 2000) AND ("Year" <= 2008)) Total runtime: 371201.156 ms (7 rows) I understand that the problem here is the memory used by the sort method. *But*, I already changed the work_mem parameter to 6gigs:) airtraffic=# show work_mem; work_mem ---------- 6GB (1 row) And when I saw that this did not work, I also changed the 8.3 style (?) parameter sort_mem: airtraffic=# show sort_mem; work_mem ---------- 6GB (1 row) which means that of course work_mem == sort_mem, as such, shouldn't be the case that the sort algorithm should have used much more memory? I also attach the output of 'show all;' so you can advice me in any other configuration settings that I might need to change to perform better. Thank you very much for your feedback and all of your help! lefteris sidirourgos P.S. sorry if you get this message twice but the first one does not appear to have reached the list ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ airtraffic=# show all; name | setting | description ---------------------------------+---------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------- add_missing_from | off | Automatically adds missing table references to FROM clauses. allow_system_table_mods | off | Allows modifications of the structure of system tables. archive_command | (disabled) | Sets the shell command that will be called to archive a WAL file. archive_mode | off | Allows archiving of WAL files using archive_command. archive_timeout | 0 | Forces a switch to the next xlog file if a new file has not been started within N seconds. array_nulls | on | Enable input of NULL elements in arrays. authentication_timeout | 1min | Sets the maximum allowed time to complete client authentication. autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates or deletes prior to analyze. autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound. autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_naptime | 1min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | -1 | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum. backslash_quote | safe_encoding | Sets whether "\'" is allowed in string literals. bgwriter_delay | 200ms | Background writer sleep time between rounds. bgwriter_lru_maxpages | 100 | Background writer maximum number of LRU pages to flush per round. bgwriter_lru_multiplier | 2 | Multiple of the average buffer usage to free per round. block_size | 8192 | Shows the size of a disk block. bonjour_name | | Sets the Bonjour broadcast service name. check_function_bodies | on | Check function bodies during CREATE FUNCTION. checkpoint_completion_target | 0.5 | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval. checkpoint_segments | 30 | Sets the maximum distance in log segments between automatic WAL checkpoints. checkpoint_timeout | 5min | Sets the maximum time between automatic WAL checkpoints. checkpoint_warning | 30s | Enables warnings if checkpoint segments are filled more frequently than this. client_encoding | UTF8 | Sets the client's character set encoding. client_min_messages | notice | Sets the message levels that are sent to the client. commit_delay | 0 | Sets the delay in microseconds between transaction commit and flushing WAL to disk. commit_siblings | 5 | Sets the minimum concurrent open transactions before performing commit_delay. config_file | /export/scratch0/lsidir/postgres/pgdata/postgresql.conf | Sets the server's main configuration file. constraint_exclusion | partition | Enables the planner to use constraints to optimize queries. cpu_index_tuple_cost | 0.005 | Sets the planner's estimate of the cost of processing each index entry during an index scan. cpu_operator_cost | 0.0025 | Sets the planner's estimate of the cost of processing each operator or function call. cpu_tuple_cost | 0.01 | Sets the planner's estimate of the cost of processing each tuple (row). cursor_tuple_fraction | 0.1 | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved. custom_variable_classes | | Sets the list of known custom variable classes. data_directory | /export/scratch0/lsidir/postgres/pgdata | Sets the server's data directory. DateStyle | ISO, DMY | Sets the display format for date and time values. db_user_namespace | off | Enables per-database user names. deadlock_timeout | 1s | Sets the time to wait on a lock before checking for deadlock. debug_assertions | off | Turns on various assertion checks. debug_pretty_print | on | Indents parse and plan tree displays. debug_print_parse | off | Logs each query's parse tree. debug_print_plan | off | Logs each query's execution plan. debug_print_rewritten | off | Logs each query's rewritten parse tree. default_statistics_target | 100 | Sets the default statistics target. default_tablespace | | Sets the default tablespace to create tables and indexes in. default_text_search_config | pg_catalog.english | Sets default text search configuration. default_transaction_isolation | read committed | Sets the transaction isolation level of each new transaction. default_transaction_read_only | off | Sets the default read-only status of new transactions. default_with_oids | off | Create new tables with OIDs by default. dynamic_library_path | $libdir | Sets the path for dynamically loadable modules. effective_cache_size | 8MB | Sets the planner's assumption about the size of the disk cache. effective_io_concurrency | 1 | Number of simultaneous requests that can be handled efficiently by the disk subsystem. enable_bitmapscan | on | Enables the planner's use of bitmap-scan plans. enable_hashagg | on | Enables the planner's use of hashed aggregation plans. enable_hashjoin | on | Enables the planner's use of hash join plans. enable_indexscan | on | Enables the planner's use of index-scan plans. enable_mergejoin | on | Enables the planner's use of merge join plans. enable_nestloop | on | Enables the planner's use of nested-loop join plans. enable_seqscan | on | Enables the planner's use of sequential-scan plans. enable_sort | on | Enables the planner's use of explicit sort steps. enable_tidscan | on | Enables the planner's use of TID scan plans. escape_string_warning | on | Warn about backslash escapes in ordinary string literals. external_pid_file | | Writes the postmaster PID to the specified file. extra_float_digits | 0 | Sets the number of digits displayed for floating-point values. from_collapse_limit | 8 | Sets the FROM-list size beyond which subqueries are not collapsed. fsync | on | Forces synchronization of updates to disk. full_page_writes | on | Writes full pages to WAL when first modified after a checkpoint. geqo | on | Enables genetic query optimization. geqo_effort | 5 | GEQO: effort is used to set the default for other GEQO parameters. geqo_generations | 0 | GEQO: number of iterations of the algorithm. geqo_pool_size | 0 | GEQO: number of individuals in the population. geqo_selection_bias | 2 | GEQO: selective pressure within the population. geqo_threshold | 12 | Sets the threshold of FROM items beyond which GEQO is used. gin_fuzzy_search_limit | 0 | Sets the maximum allowed result for exact search by GIN. hba_file | /export/scratch0/lsidir/postgres/pgdata/pg_hba.conf | Sets the server's "hba" configuration file. ident_file | /export/scratch0/lsidir/postgres/pgdata/pg_ident.conf | Sets the server's "ident" configuration file. ignore_system_indexes | off | Disables reading from system indexes. integer_datetimes | on | Datetimes are integer based. IntervalStyle | postgres | Sets the display format for interval values. join_collapse_limit | 8 | Sets the FROM-list size beyond which JOIN constructs are not flattened. krb_caseins_users | off | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive. krb_server_keyfile | | Sets the location of the Kerberos server key file. krb_srvname | postgres | Sets the name of the Kerberos service. lc_collate | en_GB.UTF-8 | Shows the collation order locale. lc_ctype | en_GB.UTF-8 | Shows the character classification and case conversion locale. lc_messages | en_GB.UTF-8 | Sets the language in which messages are displayed. lc_monetary | en_GB.UTF-8 | Sets the locale for formatting monetary amounts. lc_numeric | en_GB.UTF-8 | Sets the locale for formatting numbers. lc_time | en_GB.UTF-8 | Sets the locale for formatting date and time values. listen_addresses | localhost | Sets the host name or IP address(es) to listen to. local_preload_libraries | | Lists shared libraries to preload into each backend. log_autovacuum_min_duration | -1 | Sets the minimum execution time above which autovacuum actions will be logged. log_checkpoints | off | Logs each checkpoint. log_connections | off | Logs each successful connection. log_destination | stderr | Sets the destination for server log output. log_directory | pg_log | Sets the destination directory for log files. log_disconnections | off | Logs end of a session, including duration. log_duration | off | Logs the duration of each completed SQL statement. log_error_verbosity | default | Sets the verbosity of logged messages. log_executor_stats | off | Writes executor performance statistics to the server log. log_filename | postgresql-%Y-%m-%d_%H%M%S.log | Sets the file name pattern for log files. log_hostname | off | Logs the host name in the connection logs. log_line_prefix | | Controls information prefixed to each log line. log_lock_waits | off | Logs long lock waits. log_min_duration_statement | -1 | Sets the minimum execution time above which statements will be logged. log_min_error_statement | error | Causes all statements generating error at or above this level to be logged. log_min_messages | warning | Sets the message levels that are logged. log_parser_stats | off | Writes parser performance statistics to the server log. log_planner_stats | off | Writes planner performance statistics to the server log. log_rotation_age | 1d | Automatic log file rotation will occur after N minutes. log_rotation_size | 10MB | Automatic log file rotation will occur after N kilobytes. log_statement | none | Sets the type of statements logged. log_statement_stats | off | Writes cumulative performance statistics to the server log. log_temp_files | -1 | Log the use of temporary files larger than this number of kilobytes. log_timezone | Europe/Amsterdam | Sets the time zone to use in log messages. log_truncate_on_rotation | off | Truncate existing log files of same name during log rotation. logging_collector | off | Start a subprocess to capture stderr output and/or csvlogs into log files. maintenance_work_mem | 2GB | Sets the maximum memory to be used for maintenance operations. max_connections | 100 | Sets the maximum number of concurrent connections. max_files_per_process | 1000 | Sets the maximum number of simultaneously open files for each server process. max_function_args | 100 | Shows the maximum number of function arguments. max_identifier_length | 63 | Shows the maximum identifier length. max_index_keys | 32 | Shows the maximum number of index keys. max_locks_per_transaction | 64 | Sets the maximum number of locks per transaction. max_prepared_transactions | 0 | Sets the maximum number of simultaneously prepared transactions. max_stack_depth | 2MB | Sets the maximum stack depth, in kilobytes. password_encryption | on | Encrypt passwords. port | 5432 | Sets the TCP port the server listens on. post_auth_delay | 0 | Waits N seconds on connection startup after authentication. pre_auth_delay | 0 | Waits N seconds on connection startup before authentication. random_page_cost | 4 | Sets the planner's estimate of the cost of a nonsequentially fetched disk page. regex_flavor | advanced | Sets the regular expression "flavor". search_path | "$user",public | Sets the schema search order for names that are not schema-qualified. segment_size | 1GB | Shows the number of pages per disk file. seq_page_cost | 1 | Sets the planner's estimate of the cost of a sequentially fetched disk page. server_encoding | UTF8 | Sets the server (database) character set encoding. server_version | 8.4.2 | Shows the server version. server_version_num | 80402 | Shows the server version as an integer. session_replication_role | origin | Sets the session's behavior for triggers and rewrite rules. shared_buffers | 32MB | Sets the number of shared memory buffers used by the server. shared_preload_libraries | | Lists shared libraries to preload into server. silent_mode | off | Runs the server silently. sql_inheritance | on | Causes subtables to be included by default in various commands. ssl | off | Enables SSL connections. standard_conforming_strings | off | Causes '...' strings to treat backslashes literally. statement_timeout | 0 | Sets the maximum allowed duration of any statement. stats_temp_directory | pg_stat_tmp | Writes temporary statistics files to the specified directory. superuser_reserved_connections | 3 | Sets the number of connection slots reserved for superusers. synchronize_seqscans | on | Enable synchronized sequential scans. synchronous_commit | on | Sets immediate fsync at commit. syslog_facility | local0 | Sets the syslog "facility" to be used when syslog enabled. syslog_ident | postgres | Sets the program name used to identify PostgreSQL messages in syslog. tcp_keepalives_count | 0 | Maximum number of TCP keepalive retransmits. tcp_keepalives_idle | 0 | Time between issuing TCP keepalives. tcp_keepalives_interval | 0 | Time between TCP keepalive retransmits. temp_buffers | 1024 | Sets the maximum number of temporary buffers used by each session. temp_tablespaces | | Sets the tablespace(s) to use for temporary tables and sort files. TimeZone | Europe/Amsterdam | Sets the time zone for displaying and interpreting time stamps. timezone_abbreviations | Default | Selects a file of time zone abbreviations. trace_notify | off | Generates debugging output for LISTEN and NOTIFY. trace_sort | off | Emit information about resource usage in sorting. track_activities | on | Collects information about executing commands. track_activity_query_size | 1024 | Sets the size reserved for pg_stat_activity.current_query, in bytes. track_counts | on | Collects statistics on database activity. track_functions | none | Collects function-level statistics on database activity. transaction_isolation | read committed | Sets the current transaction's isolation level. transaction_read_only | off | Sets the current transaction's read-only status. transform_null_equals | off | Treats "expr=NULL" as "expr IS NULL". unix_socket_directory | | Sets the directory where the Unix-domain socket will be created. unix_socket_group | | Sets the owning group of the Unix-domain socket. unix_socket_permissions | 511 | Sets the access permissions of the Unix-domain socket. update_process_title | on | Updates the process title to show the active SQL command. vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit | 1 | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache. vacuum_freeze_min_age | 50000000 | Minimum age at which VACUUM should freeze a table row. vacuum_freeze_table_age | 150000000 | Age at which VACUUM should scan whole table to freeze tuples. wal_block_size | 8192 | Shows the block size in the write ahead log. wal_buffers | 64kB | Sets the number of disk-page buffers in shared memory for WAL. wal_segment_size | 16MB | Shows the number of pages per write ahead log segment. wal_sync_method | fdatasync | Selects the method used for forcing WAL updates to disk. wal_writer_delay | 200ms | WAL writer sleep time between WAL flushes. work_mem | 6GB | Sets the maximum memory to be used for query workspaces. xmlbinary | base64 | Sets how binary values are to be encoded in XML. xmloption | content | Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments. zero_damaged_pages | off | Continues processing past damaged page headers. (193 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance