Re: A very long running query....

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Title: RE: A very long running query....
On 21/07/2012 10:16, Marc Mamin wrote:


Hello,
isn't the first test superfluous here ?

>       where extract('day' from message_copies.msg_date_rec) = 17
>       and date_trunc('day',    message_copies.msg_date_rec) = '2012-07-17'


> Here is the index:
>
> CREATE INDEX idx_message_copies_wk2_date_src_pos_partial
>   ON feed_all_y2012m07.message_copies_wk2
>   USING btree
>   (date_trunc('day'::text, msg_date_rec),
>   src_id,
>   (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text))
> TABLESPACE archive
>   WHERE (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) IS NOT NULL
>   OR NOT (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) = ''::text;


the georef test can be simplified using coalesce:

>  and     (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) <> ''
>  and not (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) is null
  =>
  and coaesce (
    (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4),
     '') <> ''
 
In order to avoid this test at query time you might add a boolean column   message_copies.pos.has_georef,
and keep it up to date  with a before insert or update trigger. This will allow to shorten your index definition and simplify the planner task a little bit.
Moreover it will fasten your query in cases when the index don't get used.

As Tom already mentioned it, it may make sense not to concatenate the georef within the index, but keep them separated, or even keep them in different indexes.
Which is the best depend on the other queries running against this table
 
HTH,

Marc Mamin
 


-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx on behalf of Ioannis Anagnostopoulos
Sent: Sat 7/21/2012 1:56 AM
To: Tom Lane
Cc: Claudio Freire; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re: A very long running query....

On 21/07/2012 00:10, Tom Lane wrote:
> Claudio Freire <klaussfreire@xxxxxxxxx> writes:
>> Looking at this:
>> "                    ->  Index Scan using
>> idx_message_copies_wk2_date_src_pos_partial on message_copies_wk2
>> message_copies  (cost=0.00..19057.93 rows=52 width=32) (actual
>> time=62.124..5486270.845 rows=387524 loops=1)"
>> "                          Index Cond: ((date_trunc('day'::text,
>> msg_date_rec) = '2012-07-17 00:00:00'::timestamp without time zone)
>> AND (src_id = 1))"
>> "                          Filter: ((date_part('day'::text,
>> msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
>> || (pos_georef2)::text) || (pos_georef3)::text) ||
>> (pos_georef4)::text) IS NULL)) AND (((((pos_georef1)::text ||
>> (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
>> <> ''::text))"
> I think the real problem is that the planner has no hope of doing
> anything very accurate with such an unwieldy filter condition.  I'd look
> at ways of making the filter conditions simpler, perhaps by recasting
> the data representation.  In particular, that's a horridly bad way of
> asking whether some columns are empty, which I gather is the intent.
> If you really want to do it just like that, creating an index on the
> concatenation _expression_ would guide ANALYZE to collect some stats about
> it, but it would probably be a lot more efficient to put together an AND
> or OR of tests on the individual columns.
>
>                       regards, tom lane
So what you suggest is to forget all together the concatenation of the
georef1/2/3/4 and instead alter my query with something like:

georef1 is not null and not georeg1 = ''....etc for georef2 3 and 4

That would require to alter my index and have the four georef columns
separately in it and not as a concatenation and so on for the partial
index part. And a final thing, you seem to imply that the indexes are
used by the analyser to collect statistics even if they are not used. So
an index serves not only as a way to speed up targeted queries but also
to provide better statistics to the analyzer?

Kind Regards
Yiannis

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


No because it is used to select a partition. Otherwise it will go through the whole hierarchy...

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux