Re: Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

johno <jan.suchal@xxxxxxxxx> · Tue, 22 Jul 2014 07:57:08 +0200

On Tue, Jul 22, 2014 at 4:53 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

johno <jan.suchal@xxxxxxxxx> writes:

> I am trying to optimize a simple query that returns first 100 rows that

> have been updated since a given timestamp (ordered by timestamp and id

> desc).  If there are several rows with the same timestamp I need to a

> second condition, that states that I want to return rows having the given

> timestamp and id > given id.

> The obvious query is

> SELECT * FROM register_uz_accounting_entities

> WHERE effective_on > '2014-07-11' OR (effective_on = '2014-07-11' AND

> id > 1459)

> ORDER BY effective_on, id

> LIMIT 100

A more readily optimizable query is

SELECT * FROM register_uz_accounting_entities

WHERE (effective_on, id) > ('2014-07-11'::date, 1459)

ORDER BY effective_on, id

LIMIT 100

Yes, but that query has completely different semantics - I can't change that.

This formulation allows the planner to match both the WHERE and ORDER BY

clauses directly to the two-column index.

Are both fields really used? I was under the impression that only the first column from index can be used when there is a range query.

> I've tried to optimize this query by pushing down the limit and order by's

> into explicit subselects.

As noted earlier, that's unlikely to be an improvement, because on its

face it specifies more computation.  Postgres is not terribly bright

about UNIONs, either.

Despite the cost calculation in explain the actual query times are very different. I get consistent sub 50ms responses from the optimized one (union with pushing down the limits) and 500+ms for the plain one (when not using bitmap index scan).

Is this possible optimization considered by query planner or do I have "force" it?

Thanks again for your time and effort, I appreciate it.

                        regards, tom lane