Search Postgresql Archives

Re: SELECT DISTINCT performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 13:06 05/06/05, Tom Lane wrote:
K C Lau <kclau60@xxxxxxxxxxxxxx> writes:
> esdt=> explain analyze select DISTINCT ON (PlayerID) PlayerID,AtDate from
> Player
>   where PlayerID='22220' order by PlayerID desc, AtDate desc;
>   Unique  (cost=0.00..2507.66 rows=1 width=23) (actual time=0.000..187.000
> rows=1 loops=1)
>     ->  Index Scan Backward using pk_player on player  (cost=0.00..2505.55
> rows=8
> 43 width=23) (actual time=0.000..187.000 rows=1227 loops=1)
>           Index Cond: ((playerid)::text = '22220'::text)
>   Total runtime: 187.000 ms

> It appears that all the 1227 data records for that player were searched,
> even when doing a backward index scan. I would presume that, after locating
> the index for the highest AtDate, only the first data record needs to be
> retrieved.

If you'd said LIMIT 1, it indeed would have stopped sooner.  Since you
did not, it had to scan for more outputs.

                        regards, tom lane

I added LIMIT 1 to the query and it worked fine for a direct query on the table.

However, our system uses views and joined views extensively and I still got the performance problem after many tries and finally modifying the view to:

create or replace view VCurPlayer (...) as
select a.... from Player a where a.AtDate =
(select b.AtDate from Player b where b.PlayerID = a.PlayerID and b.AtDate =
(select DISTINCT ON (c.PlayerID) c.AtDate from Player c
 where c.PlayerID = b.PlayerID and c.PlayerID = a.PlayerID
 order by c.PlayerID desc, c.AtDate desc LIMIT 1)
 order by b.PlayerID desc, b.AtDate desc LIMIT 1);

esdt=> explain analyze select PlayerID,AtDate,version from VCurPlayer
 where PlayerID='22220';

Index Scan using pk_player on player a (cost=0.00..3969606927.13 rows=59 width
=27) (actual time=328.000..328.000 rows=1 loops=1)
   Index Cond: ((playerid)::text = '22220'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
-> Limit (cost=0.00..337263.45 rows=1 width=23) (actual time=0.180..0.180
 rows=1 loops=1743)
-> Index Scan Backward using pk_player on player b (cost=0.00..1652
5909.12 rows=49 width=23) (actual time=0.180..0.180 rows=1 loops=1743)
                 Index Cond: ((playerid)::text = ($1)::text)
                 Filter: ((atdate)::text = ((subplan))::text)
                 SubPlan
-> Limit (cost=0.00..1697.53 rows=1 width=23) (actual time=
0.072..0.072 rows=1 loops=1743)
-> Unique (cost=0.00..1697.53 rows=1 width=23) (actua
l time=0.063..0.063 rows=1 loops=1743)
-> Index Scan Backward using pk_player on player c (cost=0.00..1695.32 rows=885 width=23) (actual time=0.063..0.063 rows=1 loop
s=1743)
Index Cond: (((playerid)::text = ($0)::text
) AND ((playerid)::text = ($1)::text))
 Total runtime: 328.000 ms

Two subqueries are needed, whereas a single subquery would return no rows.

Please note that when we first ported our system to PostgreSQL 7.1.1 a few years ago, we used LIMIT 1 (without DISTINCT) and we did not encounter this performance problem. Our client is currently using MS Windows (using SELECT TOP 1 construct) with no such problem either.

Any suggestions?

Regards, KC.


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux