Hi, I've recently split my log table into time-based partitions, which really improves insert speed and query times for certain queries. However, I can't help thinking the query optimizer is really suboptimal here. My partitions look like this: CREATE TABLE log_cdf ( id serial NOT NULL, tstamp timestamp without time zone, peopleid integer, room character varying(50), event character varying(50), "type" character varying(50), contentid integer, CONSTRAINT pk_log_cdf PRIMARY KEY (id) ) WITH (OIDS=TRUE); ALTER TABLE log_cdf OWNER TO postgres; GRANT ALL ON TABLE log_cdf TO postgres; GRANT SELECT ON TABLE log_cdf TO public; CREATE TABLE log_cdf_200810 ( CHECK ( tstamp >= DATE '2008-10-01' AND tstamp < DATE '2008-11-01' ) ) INHERITS (log_cdf); CREATE TABLE log_cdf_200811 ( CHECK ( tstamp >= DATE '2008-11-01' AND tstamp < DATE '2008-12-01' ) ) INHERITS (log_cdf); CREATE TABLE log_cdf_200812 ( CHECK ( tstamp >= DATE '2008-12-01' AND tstamp < DATE '2009-01-01' ) ) INHERITS (log_cdf); CREATE TABLE log_cdf_200901 ( CHECK ( tstamp >= DATE '2009-01-01' AND tstamp < DATE '2009-02-01' ) ) INHERITS (log_cdf); CREATE INDEX idx_log_cdf_200810_tstamp ON log_cdf_200810 USING btree (tstamp); CREATE INDEX idx_log_cdf_200811_tstamp ON log_cdf_200811 USING btree (tstamp); CREATE INDEX idx_log_cdf_200812_tstamp ON log_cdf_200812 USING btree (tstamp); CREATE INDEX idx_log_cdf_200901_tstamp ON log_cdf_200901 USING btree (tstamp); And ofcourse I've added the matching trigger function as well. On top of that, I have my old table which I've renamed to log_cdf_old, added a check constraint for tstamp < DATE '2008-10-01' and set to inherit log_cdf. -------------- PROBLEM 1 -------------- But now, simple queries like: Select * from log_cdf Order by tstamp desc Limit 100 All of a sudden take ages to complete and the query plan looks absolutely awful: Limit (cost=8333060.45..8333060.70 rows=100 width=374) -> Sort (cost=8333060.45..8676529.57 rows=137387645 width=374) Sort Key: public.log_cdf.tstamp -> Result (cost=0.00..3082203.45 rows=137387645 width=374) -> Append (cost=0.00..3082203.45 rows=137387645 width=374) -> Seq Scan on log_cdf (cost=0.00..12.00 rows=200 width=374) -> Seq Scan on log_cdf_old log_cdf (cost=0.00..2915379.36 rows=129675136 width=57) -> Seq Scan on log_cdf_200810 log_cdf (cost=0.00..166776.09 rows=7711709 width=58) -> Seq Scan on log_cdf_200811 log_cdf (cost=0.00..12.00 rows=200 width=374) -> Seq Scan on log_cdf_200812 log_cdf (cost=0.00..12.00 rows=200 width=374) -> Seq Scan on log_cdf_200901 log_cdf (cost=0.00..12.00 rows=200 width=374) This while analyse select * from log_cdf_200810 order by tstamp desc limit 100 Limit (cost=0.00..7.51 rows=100 width=58) -> Index Scan Backward using idx_log_cdf_200810_tstamp on log_cdf_200810 (cost=0.00..579351.47 rows=7711021 width=58) Is much, much better. -------------- PROBLEM 2 -------------- I also notice that the query planner doesn't take time variables (like LOCALTIMESTAMP or now() ) into account at all. Look at this: select * from log_cdf where tstamp > '2008-10-10' -- 5 days ago and tstamp < '2008-10-15' -- today Gives the plan: Result (cost=0.00..205388.86 rows=3144503 width=374) -> Append (cost=0.00..205388.86 rows=3144503 width=374) -> Seq Scan on log_cdf (cost=0.00..13.00 rows=1 width=374) Filter: ((tstamp > '2008-10-10 00:00:00'::timestamp without time zone) AND (tstamp < '2008-10-15 00:00:00'::timestamp without time zone)) -> Seq Scan on log_cdf_200810 log_cdf (cost=0.00..205375.86 rows=3144502 width=58) Filter: ((tstamp > '2008-10-10 00:00:00'::timestamp without time zone) AND (tstamp < '2008-10-15 00:00:00'::timestamp without time zone)) Which for all intents and purposes is exactly what I'd expect. But now select * from log_cdf where tstamp > LOCALTIMESTAMP - interval '5 days' and tstamp < LOCALTIMESTAMP Gives me the following plan: Result (cost=0.00..1161067.27 rows=3476371 width=374)" -> Append (cost=0.00..1161067.27 rows=3476371 width=374)" -> Seq Scan on log_cdf (cost=0.00..15.50 rows=1 width=374)" Filter: ((tstamp < ('now'::text)::timestamp without time zone) AND (tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval))) -> Bitmap Heap Scan on log_cdf_old log_cdf (cost=8054.10..909421.75 rows=378889 width=57) Recheck Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Bitmap Index Scan on idx_log_cdf_old_tstamp (cost=0.00..7959.37 rows=378889 width=0) Index Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Bitmap Heap Scan on log_cdf_200810 log_cdf (cost=76722.53..251605.18 rows=3097478 width=58) Recheck Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Bitmap Index Scan on idx_log_cdf_200810_tstamp (cost=0.00..75948.16 rows=3097478 width=0) Index Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Index Scan using idx_log_cdf_200811_tstamp on log_cdf_200811 log_cdf (cost=0.01..8.28 rows=1 width=374) Index Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Index Scan using idx_log_cdf_200812_tstamp on log_cdf_200812 log_cdf (cost=0.01..8.28 rows=1 width=374) Index Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) -> Index Scan using idx_log_cdf_200901_tstamp on log_cdf_200901 log_cdf (cost=0.01..8.28 rows=1 width=374) Index Cond: ((tstamp > (('now'::text)::timestamp without time zone - '5 days'::interval)) AND (tstamp < ('now'::text)::timestamp without time zone)) Am I expecting too much or am I doing something wrong here? Any help on how to improve would be appreciated -- or if one of the core developers (Tom?) would care to tweak to optimiser a bit for a following release... Regards, Peter -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general