Ave! Yesterday I noticed a problem with my PostgreSQL installation. I have three database clusters, one using version 7.4.6 and the rest using version 8.0.1. The problem manifests itself in all three installations. (The three databases are copies of -- more or less -- the same data; now I'm writing a program to really synchronize these databases, but that's besides the point. I'll only describe one of them; the same can be said of the others, too.) I have two really big tables, the problem is with one of them. The table looks as follows: Table "public.pakiety" Column | Type | Modifiers -------------+-----------------------------+--------------------------------------------- pktid | integer | not null default nextval('pktid_seq') stid | smallint | not null received | timestamp(6) with time zone | not null measured | timestamp(0) with time zone | not null station_time | timestamp(0) with time zone | bezwzgl | smallint | full_cycle | boolean | Indexes: "pakiety_pkey" PRIMARY KEY, btree (pktid) "pakiety_stid_received_idx" UNIQUE, btree (stid, received) "pakiety_measured_idx" btree (measured) "pakiety_received_idx" btree (received) "pakiety_stid_measured_idx" btree (stid, measured) This table holds info about data packets sent by our automated meteo stations. The table contains about 15 million rows. Any type of a sequential scan will be extremely slow on this table. Even an index scan will take minutes if the index is not fit for the search. What's unusual is that: 1) there is only a very limited set of used stid (station ID) values - about 500 or so, and 2) there can be some very long periods of time (measured in months) when a specific station doesn't send any packets at all. That's why I created indices pakiety_stid_received_idx and pakiety_stid_measured_idx. Unfortunately, when I'm looking for the last data packet from a specific station, I get an index scan using only the "pakiety_received_idx" index and not the "pakiety_stid_received_idx" one. While it takes a split second for a recently active station, it really takes several minutes for a station which has been quiet for several months. trax=# explain select * from pakiety where stid = 234::smallint order by received desc limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6.26 rows=1 width=33) -> Index Scan Backward using pakiety_received_idx on pakiety (cost=0.00..193599.37 rows=30915 width=33) Filter: (stid = 234::smallint) (3 rows) What's funny, I get the same plan when using "limit 10000": trax=# explain select * from pakiety where stid = 234::smallint order by received desc limit 10000; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..62621.32 rows=10000 width=33) -> Index Scan Backward using pakiety_received_idx on pakiety (cost=0.00..193756.63 rows=30941 width=33) Filter: (stid = 234::smallint) (3 rows) I have just noticed another problem: when the limit goes even higher, the planner decides to use another unfit index. trax=# explain select * from pakiety where stid = 234::smallint order by received desc limit 100000; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=125101.46..125178.81 rows=30940 width=33) -> Sort (cost=125101.46..125178.81 rows=30940 width=33) Sort Key: received -> Index Scan using pakiety_stid_measured_idx on pakiety (cost=0.00..122288.52 rows=30940 width=33) Index Cond: (stid = 234::smallint) (5 rows) The same index is also used if I remove the "limit n" clause entirely: trax=# explain select * from pakiety where stid = 234::smallint order by received desc; QUERY PLAN --------------------------------------------------------------------------------------------------------- Sort (cost=125293.11..125370.58 rows=30986 width=33) Sort Key: received -> Index Scan using pakiety_stid_measured_idx on pakiety (cost=0.00..122474.14 rows=30986 width=33) Index Cond: (stid = 234::smallint) (4 rows) The "pakiety_stid_received_idx" index never gets used. Any hints about possible reasons of such behavior? Regards, Piotr Sulecki. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq