On 07/08/2012 17:00, Jeff Janes wrote:
On Mon, Aug 6, 2012 at 8:08 AM, Ioannis Anagnostopoulos
<ioannis@xxxxxxxxxx> wrote:
Hi, my query is very simple:
select
msg_id,
msg_type,
ship_pos_messages.pos_georef1,
ship_pos_messages.pos_georef2,
ship_pos_messages.pos_georef3,
ship_pos_messages.pos_georef4,
obj_id,
ship_speed,
ship_heading,
ship_course,
pos_point
from
feed_all_y2012m08.ship_pos_messages
where
extract('day' from msg_date_rec) = 1
AND msg_id = any(ARRAY[7294724,14174174,22254408]);
The msg_id is the pkey on the ship_pos_messages table and in this example it
is working fast as it uses the pkey (primary key index) to make the
selection. The expplain anayze follows:
...
I think this is a pretty good plan and quite quick given the size of the
table (88Million rows at present). However in real life the parameter where
I search for msg_id is not an array of 3 ids but of 300.000 or more. It is
then that the query forgets the plan and goes to sequential scan. Is there
any way around? Or is this the best I can have?
What happens if you set "enable_seqscan=off" and run the query with
the very large list? (This is an experiment, not a recommendation for
production use)
Cheers,
Jeff
As Tom said, the actual question is not valid. Seq scan are not bad, we
just need to understand the way around it instead of forcing them off.
In my case, the problem was the ARRAY as a parameter (which all together
is not that great for holding so many data). By converting it into a
temporary table and performing an inner join in the query (after
analysing the temp table) you get a nice Hash join (or Merge Join if you
don't analyse the temp table).
cheers Yiannis
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance