Hi all,
here is my postgresql configuration:
"version";"PostgreSQL 9.0.3 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit"
"bytea_output";"escape"
"checkpoint_segments";"64"
"client_encoding";"UNICODE"
"effective_cache_size";"6GB"
"fsync";"off"
"lc_collate";"C"
"lc_ctype";"C"
"listen_addresses";"*"
"log_destination";"syslog"
"max_connections";"20"
"max_stack_depth";"2MB"
"server_encoding";"UTF8"
"shared_buffers";"4GB"
"silent_mode";"on"
"synchronous_commit";"off"
"TimeZone";"Europe/Jersey"
"update_process_title";"off"
"work_mem";"24MB"
I have a partitioned table tcpsessiondata:
CREATE TABLE appqosdata.tcpsessiondata
(
detectorid smallint not null,
createdtime bigint not null,
sessionid bigint not null,
...
);
That table has many millions of rows in each partition and no data in the main one. Here is an example:
select count(*) from appqosdata.tcpsessiondata_1;
count
----------
49377910
(1 row)
Every partition has a "Primary Key (detectorid, createdtime)"
I run the following query on that table:
select
cast (SD.detectorid as numeric),
CAST( (createdtime / 61000000000::bigint) AS numeric) as timegroup,
sum(datafromsource)+sum(datafromdestination) as numbytes,
CAST ( sum(packetsfromsource)+sum(packetsfromdestination) AS numeric) as numpackets
from
appqosdata.tcpsessiondata SD
where
SD.detectorid >= 0 and SD.createdtime >= 1297266601368086000::bigint
and SD.createdtime < 1297270202368086000::bigint
group by SD.detectorid, timegroup
The table is partitioned by a "sessionid" which is not used in this particular query so I already expect all partitions to be touched. However I have a bad scan choice on at least a couple of partitions:
"HashAggregate (cost=5679026.42..5679028.76 rows=67 width=34) (actual time=160113.366..160113.366 rows=0 loops=1)"
" Output: (sd.detectorid)::numeric, (((sd.createdtime / 61000000000::bigint))::numeric), (sum(sd.datafromsource) + sum(sd.datafromdestination)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))::numeric, sd.detectorid"
" -> Result (cost=0.00..5679025.41 rows=67 width=34) (actual time=160113.360..160113.360 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, ((sd.createdtime / 61000000000::bigint))::numeric"
" -> Append (cost=0.00..5679025.08 rows=67 width=34) (actual time=160113.356..160113.356 rows=0 loops=1)"
" -> Seq Scan on appqosdata.tcpsessiondata sd (cost=0.00..23.65 rows=1 width=34) (actual time=0.002..0.002 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_1 sd (cost=0.00..1373197.46 rows=1 width=34) (actual time=46436.737..46436.737 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_2 sd (cost=0.00..2447484.00 rows=1 width=34) (actual time=108359.967..108359.967 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_3_pkey on appqosdata.tcpsessiondata_3 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.016..0.016 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
... (many more partitions here)....
" -> Index Scan using tcpsessiondata_61_pkey on appqosdata.tcpsessiondata_61 sd (cost=0.00..8162.42 rows=1 width=34) (actual time=25.446..25.446 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_62_pkey on appqosdata.tcpsessiondata_62 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.008..0.008 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_63_pkey on appqosdata.tcpsessiondata_63 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.006..0.006 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_64 sd (cost=0.00..13.00 rows=1 width=34) (actual time=0.102..0.102 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_65 sd (cost=0.00..117.64 rows=1 width=34) (actual time=0.854..0.854 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_66_pkey on appqosdata.tcpsessiondata_66 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.007..0.007 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_67_pkey on appqosdata.tcpsessiondata_67 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.005..0.005 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"Total runtime: 160114.339 ms"
The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row out of millions could potentially be selected? As you can see ~90% of the time is spent on those 2 partitions even if they are not apparently different from any of the others.
I would appreciate any help with this issue.
Thank you,
Svetlin Manavski