I have run into this type of query problem as well. I solved it in my application by the following type of query. SELECT tlid FROM completechain AS o WHERE not exists ( SELECT 1 FROM completechain WHERE tlid=o.tlid and ogc_fid!=o.ogc_fid ); Assumes of course that you have an index on tlid. > -----Original Message----- > From: pgsql-performance-owner@xxxxxxxxxxxxxx > [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of > Charlie Savage > Sent: Tuesday, November 08, 2005 2:05 AM > To: pgsql-performance@xxxxxxxxxxxxxx > Subject: Sort performance on large tables > > Hi everyone, > > I have a question about the performance of sort. > > Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 > RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate > 400GB drive. The IDE drive has the OS and the WAL files, the > SATA drive the database. > From hdparm the max IO for the IDE drive is about 50Mb/s and > the SATA drive is about 65Mb/s. Thus a very low-end machine > - but it used just for development (i.e., it is not a > production machine) and the only thing it does is run a > PostgresSQL database. > > I have a staging table called completechain that holds US > tiger data (i.e., streets and addresses for the US). The > table is approximately 18GB. Its big because there is a lot > of data, but also because the table is not normalized (it > comes that way). > > I want to extract data out of the file, with the most > important values being stored in a column called tlid. The > tlid field is an integer, and the values are 98% unique. > There is a second column called ogc_fid which is unique (it > is a serial field). I need to extract out unique TLID's > (doesn't matter which duplicate I get rid of). To do this I > am running this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid; > > The results from explain analyze are: > > "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 > width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" > " -> Sort (cost=10400373.80..10520553.06 rows=48071704 > width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" > " Sort Key: tlid" > " -> Seq Scan on completechain (cost=0.00..2228584.04 > rows=48071704 width=8) (actual time=27.514..773245.046 > rows=48199165 loops=1)" > "Total runtime: 8486057.185 ms" > > Doing a similar query produces the same results: > > SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; > > Note it takes over 10 times longer to do the sort than the > full sequential scan. > > Should I expect results like this? I realize that the > computer is quite low-end and is very IO bound for this > query, but I'm still surprised that the sort operation takes so long. > > Out of curiosity, I setup an Oracle database on the same > machine with the same data and ran the same query. Oracle > was over an order of magnitude faster. Looking at its query > plan, it avoided the sort by using "HASH GROUP BY." Does > such a construct exist in PostgreSQL (I see only hash joins)? > > Also as an experiment I forced oracle to do a sort by running > this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid > ORDER BY tlid; > > Even with this, it was more than a magnitude faster than Postgresql. > Which makes me think I have somehow misconfigured postgresql > (see the relevant parts of postgresql.conf below). > > Any idea/help appreciated. > > Thanks, > > Charlie > > > ------------------------------- > > #------------------------------------------------------------- > -------------- > # RESOURCE USAGE (except WAL) > #------------------------------------------------------------- > -------------- > > shared_buffers = 40000 # 40000 buffers * 8192 > bytes/buffer = 327,680,000 bytes > #shared_buffers = 1000 # min 16 or > max_connections*2, 8KB each > > temp_buffers = 5000 > #temp_buffers = 1000 # min 100, 8KB each > #max_prepared_transactions = 5 # can be 0 or more > # note: increasing max_prepared_transactions costs ~600 bytes > of shared memory # per transaction slot, plus lock space (see > max_locks_per_transaction). > > work_mem = 16384 # in Kb > #work_mem = 1024 # min 64, size in KB > > maintenance_work_mem = 262144 # in kb > #maintenance_work_mem = 16384 # min 1024, size in KB > #max_stack_depth = 2048 # min 100, size in KB > > # - Free Space Map - > > max_fsm_pages = 60000 > #max_fsm_pages = 20000 # min > max_fsm_relations*16, 6 bytes each > > #max_fsm_relations = 1000 # min 100, ~70 bytes each > > # - Kernel Resource Usage - > > #max_files_per_process = 1000 # min 25 > #preload_libraries = '' > > # - Cost-Based Vacuum Delay - > > #vacuum_cost_delay = 0 # 0-1000 milliseconds > #vacuum_cost_page_hit = 1 # 0-10000 credits > #vacuum_cost_page_miss = 10 # 0-10000 credits > #vacuum_cost_page_dirty = 20 # 0-10000 credits > #vacuum_cost_limit = 200 # 0-10000 credits > > # - Background writer - > > #bgwriter_delay = 200 # 10-10000 milliseconds > between rounds > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers > scanned/round > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max > written/round > #bgwriter_all_percent = 0.333 # 0-100% of all buffers > scanned/round > #bgwriter_all_maxpages = 5 # 0-1000 buffers max > written/round > > > #------------------------------------------------------------- > -------------- > # WRITE AHEAD LOG > #------------------------------------------------------------- > -------------- > > # - Settings - > > fsync = on # turns forced > synchronization on or off > #wal_sync_method = fsync # the default is the > first option > # supported by the > operating system: > # open_datasync > # fdatasync > # fsync > # fsync_writethrough > # open_sync > #full_page_writes = on # recover from > partial page writes > > wal_buffers = 128 > #wal_buffers = 8 # min 4, 8KB each > > #commit_delay = 0 # range 0-100000, in > microseconds > #commit_siblings = 5 # range 1-1000 > > # - Checkpoints - > > checkpoint_segments = 256 # 256 * 16Mb = > 4,294,967,296 bytes > checkpoint_timeout = 1200 # 1200 seconds (20 minutes) > checkpoint_warning = 30 # in seconds, 0 is off > > #checkpoint_segments = 3 # in logfile segments, > min 1, 16MB each > #checkpoint_timeout = 300 # range 30-3600, in seconds > #checkpoint_warning = 30 # in seconds, 0 is off > > # - Archiving - > > #archive_command = '' # command to use to > archive a logfile > # segment > > > #------------------------------------------------------------- > -------------- > # QUERY TUNING > #------------------------------------------------------------- > -------------- > > # - Planner Method Configuration - > > #enable_bitmapscan = on > #enable_hashagg = on > #enable_hashjoin = on > #enable_indexscan = on > #enable_mergejoin = on > #enable_nestloop = on > #enable_seqscan = on > #enable_sort = on > #enable_tidscan = on > > # - Planner Cost Constants - > > effective_cache_size = 80000 # 80000 * 8192 = > 655,360,000 bytes > #effective_cache_size = 1000 # typically 8KB each > > random_page_cost = 2.5 # units are one > sequential page fetch > #random_page_cost = 4 # units are one > sequential page fetch > # cost > #cpu_tuple_cost = 0.01 # (same) > #cpu_index_tuple_cost = 0.001 # (same) > #cpu_operator_cost = 0.0025 # (same) > > # - Genetic Query Optimizer - > > #geqo = on > #geqo_threshold = 12 > #geqo_effort = 5 # range 1-10 > #geqo_pool_size = 0 # selects default based > on effort > #geqo_generations = 0 # selects default based > on effort > #geqo_selection_bias = 2.0 # range 1.5-2.0 > > # - Other Planner Options - > > default_statistics_target = 100 # range 1-1000 > #default_statistics_target = 10 # range 1-1000 > #constraint_exclusion = off > #from_collapse_limit = 8 > #join_collapse_limit = 8 # 1 disables collapsing > of explicit > # JOINs > > > #------------------------------------------------------------- > -------------- > #------------------------------------------------------------- > -------------- > # RUNTIME STATISTICS > #------------------------------------------------------------- > -------------- > > # - Statistics Monitoring - > > #log_parser_stats = off > #log_planner_stats = off > #log_executor_stats = off > #log_statement_stats = off > > # - Query/Index Statistics Collector - > > stats_start_collector = on > stats_command_string = on > stats_block_level = on > stats_row_level = on > > #stats_start_collector = on > #stats_command_string = off > #stats_block_level = off > #stats_row_level = off > #stats_reset_on_server_start = off > > > #------------------------------------------------------------- > -------------- > # AUTOVACUUM PARAMETERS > #------------------------------------------------------------- > -------------- > > autovacuum = true > autovacuum_naptime = 600 > > #autovacuum = false # enable autovacuum subprocess? > #autovacuum_naptime = 60 # time between > autovacuum runs, in secs > #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before > # vacuum > #autovacuum_analyze_threshold = 500 # min # of tuple updates before > # analyze > #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before > # vacuum > #autovacuum_analyze_scale_factor = 0.2 # fraction of > rel size before > # analyze > #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for > # autovac, -1 means use > # vacuum_cost_delay > #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for > # autovac, -1 means use > # vacuum_cost_ > > > ---------------------- > > CREATE TABLE tiger.completechain > ( > ogc_fid int4 NOT NULL DEFAULT > nextval('completechain_ogc_fid_seq'::regclass), > module varchar(8) NOT NULL, > tlid int4 NOT NULL, > side1 int4, > source varchar(1) NOT NULL, > fedirp varchar(2), > fename varchar(30), > fetype varchar(4), > fedirs varchar(2), > cfcc varchar(3) NOT NULL, > fraddl varchar(11), > toaddl varchar(11), > fraddr varchar(11), > toaddr varchar(11), > friaddl varchar(1), > toiaddl varchar(1), > friaddr varchar(1), > toiaddr varchar(1), > zipl int4, > zipr int4, > aianhhfpl int4, > aianhhfpr int4, > aihhtlil varchar(1), > aihhtlir varchar(1), > census1 varchar(1), > census2 varchar(1), > statel int4, > stater int4, > countyl int4, > countyr int4, > cousubl int4, > cousubr int4, > submcdl int4, > submcdr int4, > placel int4, > placer int4, > tractl int4, > tractr int4, > blockl int4, > blockr int4, > wkb_geometry public.geometry NOT NULL, > CONSTRAINT enforce_dims_wkb_geometry CHECK > (ndims(wkb_geometry) = 2), > CONSTRAINT enforce_geotype_wkb_geometry CHECK > (geometrytype(wkb_geometry) = 'LINESTRING'::text OR > wkb_geometry IS NULL), > CONSTRAINT enforce_srid_wkb_geometry CHECK > (srid(wkb_geometry) = 4269) > ) > WITHOUT OIDS; > ALTER TABLE tiger.completechain OWNER TO postgres; > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > > ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings