Re: Special index for "like"-based query

Alban Hertroys <haramrae@xxxxxxxxx> · Fri, 30 Dec 2016 13:01:57 +0100

> On 30 Dec 2016, at 11:42, Job <Job@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
>>> And, basically, if you need help with some queries you could try
>>> posting them whole, even redacted, along the table defs, this way
>>> perople can see the problem and not invent one based on a partial
>>> description
> 
> Thank you very much, very kind from you.
> 
> The index applied on the timestamp field is a btree("timestamp")
> 
> The query is:
> 
> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND timestamp::date  BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY domain ORDER BY c_count DESC  LIMIT 101
> 
> The table format is:
>  Column   |           Type           |                                   Modifiers
> -----------+--------------------------+--------------------------------------------------------------------------------
> id        | numeric(1000,1)          | not null default function_get_next_sequence('webtraffic_archive_id_seq'::text)
> timestamp | timestamp with time zone |
> domain    | character varying(255)   |
> action    | character varying(5)     |
> profile   | character varying        |
> accessi   | bigint                   |
> url       | text                     |
> 
> Indexes:
>    "webtraffic_archive_day_2016_04_15_action_wbidx" btree (action)
>    "webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain)
>    "webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER
>    "webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp")
> 
> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to create on every sub-tables once?

It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in the set. The exception to that is when you have an index on the casted field.

In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to combine that into:

… AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with time zone ...

But even then, you're excluding items that fall in the second between the end date and the next day. The new range types are useful there, for example:

… AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist.

However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that timestamp column; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day in the range in the query (which could still be the better approach).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general