-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! I'm currently experimenting with PostgreSQL 8.2.4 and table partitioning in order to improve the performance of an application I'm working on. My application is about managing measurement values (lots of!) I have one table "t_mv" which stores all the measurement values. A single measurement value has a timestamp and belongs to a single time series, so table "t_mv" looks like this: CREATE TABLE t_mv ( zr integer NOT NULL, -- the time series id ts timestamp with time zone NOT NULL, -- the timestamp ... -- other attributes of a mv ) WITHOUT OIDS; ALTER TABLE t_mv ADD CONSTRAINT pk_mv_zr_ts PRIMARY KEY (zr, ts); Each time series defines several other attributes which are common to all measurement values of this time series (like sampling location, physical parameter, aggregation, cardinality, type, visibility, etc.) The application should be able to handle several thousand different time series and hundreds of millions of measurement values, so table t_mv can get quite large. I have tested installations with up to 70 millions rows in t_mv and PostgreSQL can handle that with a quite good performance even on non high-end machines (operating system is Linux, btw) But as I expect installations witch much more rows in t_mv, I tried to implement a "partitioned tables" concept using inheritance and CHECK constraints, just like it is described in the docs (e.g. chapter 5.9 in the current PostgreSQL 8.2.4 documentation) I split the t_mv table on the timestamp attribute to build child tables which hold all measurement values for a single month. That way I have several tables called "t_mv_YYYYMM" which all inherit from "t_mv". The number of child tables depends on the time period the application has to store the measurement values (which can be several years so I'm expecting up to 100 child tables or even more). For the application everything looks the same: inserts, updates and queries all are against the "t_mv" parent table, the application is not aware of the fact that this table is actually "split" into several child tables. This is working fine and for some standard queries it actually gives some performance improvement compared to the standard "everything in one big table" concept. The performance improvement increases with the number of rows in t_mv, for a small table (less than 10 million rows or so) IMHO it is not really worth the effort or even counter-productive. But I have some special queries where the performance with partitioned tables actually get much worse: those are queries where I'm working with "open" time intervals, i.e. where I want to get the previous and/or next timestamp from a given interval. A simple example: Get the timestamp of a measurement value for time series 3622 which is right before the measurement value with time stamp '2007-04-22 00:00:00': testdb_std=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1; ts - ------------------------ 2007-04-21 23:00:00+02 (1 row) Im my application there are many queries like this. Such queries also come in several variations, including quite sophisticated joins with lots of other tables "above" the time series table. Note: as I'm working with (potentially) non-equidistant time series I can not just calculate the timestamps, I have to retrieve them from the database! In the standard case, the query plan for the example query looks like this: testdb_std=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1; QUERY PLAN - ----------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.70 rows=1 width=8) (actual time=0.233..0.235 rows=1 loops=1) -> Index Scan Backward using pk_mv_zr_ts on t_mv (cost=0.00..21068.91 rows=12399 width=8) (actual time=0.221..0.221 rows=1 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) Total runtime: 0.266 ms (4 rows) If I switch to partitioned tables, the query retrieves the same result (of course): testdb_std=> \c testdb_part You are now connected to database "testdb_part". testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1; ts - ------------------------ 2007-04-21 23:00:00+02 (1 row) But the query plan becomes: testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1; QUERY PLAN - --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=23985.83..23985.83 rows=1 width=8) (actual time=230.100..230.102 rows=1 loops=1) -> Sort (cost=23985.83..24019.84 rows=13605 width=8) (actual time=230.095..230.095 rows=1 loops=1) Sort Key: mwdb.t_mv.ts -> Result (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.154..177.519 rows=15810 loops=1) -> Append (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.149..114.186 rows=15810 loops=1) -> Index Scan using pk_mv_zr_ts on t_mv (cost=0.00..8.27 rows=1 width=8) (actual time=0.047..0.047 rows=0 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200507 on t_mv_200507 t_mv (cost=0.00..2417.53 rows=1519 width=8) (actual time=0.095..2.419 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200508 on t_mv_200508 t_mv (cost=0.00..918.81 rows=539 width=8) (actual time=0.081..2.134 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200509 on t_mv_200509 t_mv (cost=0.00..941.88 rows=555 width=8) (actual time=0.061..2.051 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200510 on t_mv_200510 t_mv (cost=0.00..915.29 rows=538 width=8) (actual time=0.064..2.113 rows=715 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200511 on t_mv_200511 t_mv (cost=0.00..925.93 rows=545 width=8) (actual time=0.048..2.986 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200512 on t_mv_200512 t_mv (cost=0.00..936.53 rows=550 width=8) (actual time=0.049..2.212 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200601 on t_mv_200601 t_mv (cost=0.00..981.42 rows=579 width=8) (actual time=0.065..3.029 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200602 on t_mv_200602 t_mv (cost=0.00..856.25 rows=502 width=8) (actual time=0.045..2.866 rows=672 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200603 on t_mv_200603 t_mv (cost=0.00..977.84 rows=575 width=8) (actual time=0.052..3.044 rows=743 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200604 on t_mv_200604 t_mv (cost=0.00..906.40 rows=531 width=8) (actual time=0.053..1.976 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200605 on t_mv_200605 t_mv (cost=0.00..938.28 rows=550 width=8) (actual time=0.050..2.357 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200606 on t_mv_200606 t_mv (cost=0.00..922.35 rows=541 width=8) (actual time=0.054..2.063 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200607 on t_mv_200607 t_mv (cost=0.00..2112.64 rows=1315 width=8) (actual time=0.047..2.226 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200608 on t_mv_200608 t_mv (cost=0.00..990.23 rows=582 width=8) (actual time=0.048..2.094 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200609 on t_mv_200609 t_mv (cost=0.00..902.84 rows=528 width=8) (actual time=0.039..2.252 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200610 on t_mv_200610 t_mv (cost=0.00..964.87 rows=567 width=8) (actual time=0.033..2.118 rows=745 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200611 on t_mv_200611 t_mv (cost=0.00..947.17 rows=557 width=8) (actual time=0.060..2.160 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200612 on t_mv_200612 t_mv (cost=0.00..929.43 rows=545 width=8) (actual time=0.039..2.051 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200701 on t_mv_200701 t_mv (cost=0.00..940.05 rows=551 width=8) (actual time=0.036..2.217 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200702 on t_mv_200702 t_mv (cost=0.00..847.38 rows=496 width=8) (actual time=0.035..1.830 rows=672 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200703 on t_mv_200703 t_mv (cost=0.00..956.00 rows=561 width=8) (actual time=0.062..2.326 rows=743 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200704 on t_mv_200704 t_mv (cost=0.00..814.38 rows=378 width=8) (actual time=0.050..1.406 rows=504 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) Total runtime: 231.730 ms (52 rows) Oops! Compare the costs or the actual query time between those queries! (Note: I set "constraint_exclusion = on", of course!) As such queries are used all over the application, this nullifies any performance improvements for standard queries and in fact makes the overall application performance as "feeled" by the user _much_ worse. I also tried it with "min()" and "max()" aggregate functions instead of the "limit 1" query, but this does not change much: Standard "big" table: testdb_std=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ; max - ------------------------ 2007-04-21 23:00:00+02 (1 row) testdb_std=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ; QUERY PLAN - ------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=1.70..1.71 rows=1 width=0) (actual time=0.071..0.073 rows=1 loops=1) InitPlan -> Limit (cost=0.00..1.70 rows=1 width=8) (actual time=0.060..0.062 rows=1 loops=1) -> Index Scan Backward using pk_mv_zr_ts on t_mv (cost=0.00..21068.91 rows=12399 width=8) (actual time=0.056..0.056 rows=1 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) Filter: ((ts)::timestamp with time zone IS NOT NULL) Total runtime: 0.221 ms (7 rows) "Partitioned table": testdb_part=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ; max - ------------------------ 2007-04-21 23:00:00+02 (1 row) testdb_part=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ; QUERY PLAN - ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=23085.73..23085.74 rows=1 width=8) (actual time=390.094..390.096 rows=1 loops=1) -> Append (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.241..290.934 rows=15810 loops=1) -> Index Scan using pk_mv_zr_ts on t_mv (cost=0.00..8.27 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200507 on t_mv_200507 t_mv (cost=0.00..2417.53 rows=1519 width=8) (actual time=0.197..12.598 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200508 on t_mv_200508 t_mv (cost=0.00..918.81 rows=539 width=8) (actual time=0.095..5.947 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200509 on t_mv_200509 t_mv (cost=0.00..941.88 rows=555 width=8) (actual time=0.118..2.247 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200510 on t_mv_200510 t_mv (cost=0.00..915.29 rows=538 width=8) (actual time=0.121..6.219 rows=715 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200511 on t_mv_200511 t_mv (cost=0.00..925.93 rows=545 width=8) (actual time=2.287..9.991 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200512 on t_mv_200512 t_mv (cost=0.00..936.53 rows=550 width=8) (actual time=0.110..2.285 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200601 on t_mv_200601 t_mv (cost=0.00..981.42 rows=579 width=8) (actual time=0.209..4.682 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200602 on t_mv_200602 t_mv (cost=0.00..856.25 rows=502 width=8) (actual time=0.079..6.079 rows=672 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200603 on t_mv_200603 t_mv (cost=0.00..977.84 rows=575 width=8) (actual time=0.091..4.793 rows=743 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200604 on t_mv_200604 t_mv (cost=0.00..906.40 rows=531 width=8) (actual time=0.108..7.637 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200605 on t_mv_200605 t_mv (cost=0.00..938.28 rows=550 width=8) (actual time=0.116..4.772 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200606 on t_mv_200606 t_mv (cost=0.00..922.35 rows=541 width=8) (actual time=0.074..6.071 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200607 on t_mv_200607 t_mv (cost=0.00..2112.64 rows=1315 width=8) (actual time=0.082..4.807 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200608 on t_mv_200608 t_mv (cost=0.00..990.23 rows=582 width=8) (actual time=2.283..8.671 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200609 on t_mv_200609 t_mv (cost=0.00..902.84 rows=528 width=8) (actual time=0.107..6.067 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200610 on t_mv_200610 t_mv (cost=0.00..964.87 rows=567 width=8) (actual time=0.074..3.933 rows=745 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200611 on t_mv_200611 t_mv (cost=0.00..947.17 rows=557 width=8) (actual time=0.091..6.291 rows=720 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200612 on t_mv_200612 t_mv (cost=0.00..929.43 rows=545 width=8) (actual time=0.077..4.101 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200701 on t_mv_200701 t_mv (cost=0.00..940.05 rows=551 width=8) (actual time=0.077..2.558 rows=744 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200702 on t_mv_200702 t_mv (cost=0.00..847.38 rows=496 width=8) (actual time=0.073..4.346 rows=672 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200703 on t_mv_200703 t_mv (cost=0.00..956.00 rows=561 width=8) (actual time=2.532..7.206 rows=743 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) -> Index Scan using pk_mv_200704 on t_mv_200704 t_mv (cost=0.00..814.38 rows=378 width=8) (actual time=0.120..4.163 rows=504 loops=1) Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp with time zone)) Total runtime: 394.384 ms (49 rows) Now my question is: Does the query planner in the case of partitioned tables really have to scan all indexes in order to get the next timestamp smaller (or larger) than a given one? There are check conditions on all table partitions like this: For table t_mv_200704: CHECK (ts::timestamp with time zone >= '2007-04-01 00:00:00+02'::timestamp with time zone AND ts::timestamp with time zone < '2007-05-01 00:00:00+02'::timestamp with time zone) For table t_mv_200703: CHECK (ts::timestamp with time zone >= '2007-03-01 00:00:00+01'::timestamp with time zone AND ts::timestamp with time zone < '2007-04-01 00:00:00+02'::timestamp with time zone) and so on... So the tables are in a well defined, monotonic sort order regarding the timestamp. This means that if there is a max(ts) for ts < '2007-04-22 00:00:00' already in table t_mv_200704, it makes no sense to look further in other tables where the timestamps can only be smaller than the timestamp already found. Am I correct? Is there room for improvements of the query planner for queries like this or is this a special case which will never get handled anyway? Or would you suggest a completely different table structure or perhaps some other query? I'm open for any suggestion! - - andreas - -- Andreas Haumer | mailto:andreas@xxxxxxxxx *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGNdZ2xJmyeGcXPhERAsbfAJ9nA+z50uXiV4SHntt1Y9IuZ/rzWwCff8ar xKSMfzwgjx9kQipeDoEnXWE= =57aJ -----END PGP SIGNATURE-----