Re: Why DISTINCT ... DESC is slow?

Richard Huxton <dev@xxxxxxxxxxxx> · Tue, 12 Dec 2006 08:13:40 +0000

Michael Glaesemann wrote:

On Dec 12, 2006, at 16:43 , Richard Huxton wrote:

Anton wrote:
While without DESC query goes faster... But not so fast!
=# explain analyze SELECT DISTINCT ON (login_id) login_id,
collect_time AS dt FROM n_traffic ORDER BY login_id collect_time;
    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Unique  (cost=0.00..29843.08 rows=532 width=12) (actual
time=0.045..5146.768 rows=798 loops=1)
  ->  Index Scan using n_traffic_login_id_collect_time on n_traffic
(cost=0.00..27863.94 rows=791656 width=12) (actual
time=0.037..3682.853 rows=791656 loops=1)
Total runtime: 5158.735 ms
(3 rows)
Why? 768 rows is about 1000 times smaller than entire n_traffic. And
why Index Scan used without DESC but with DESC is not?

For the DESC version to use the index try "login_id DESC collect_time 
DESC" - so both are reversed.

I'm also not sure what this query is meant to do precisely. ORDER BY 
is usually the last stage in a query, so it might be applied *after* 
the DISTINCT ON.

My understanding is that DISTINCT ON requires the ORDER BY, so I'd be 
surprised if ORDER BY is applied after. (Though I'm happy to hear more 
about this.)

(goes away and tests) Ah, you're quite right. I was worried about 
ill-defined results, but it prevents you from doing that.

--
  Richard Huxton
  Archonet Ltd