Re: DISTINCT vs. GROUP BY

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9 February 2010 21:46, Dimi Paun <dimi@xxxxxxxxxxx> wrote:
> >From what I've read on the net, these should be very similar,
> and should generate equivalent plans, in such cases:
>
> SELECT DISTINCT x FROM mytable
> SELECT x FROM mytable GROUP BY x
>
> However, in my case (postgresql-server-8.1.18-2.el5_4.1),
> they generated different results with quite different
> execution times (73ms vs 40ms for DISTINCT and GROUP BY
> respectively):
>
> tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
>                                                                 QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=775.68..775.69 rows=1 width=146) (actual time=40.058..40.058 rows=0 loops=1)
>   ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=40.055..40.055 rows=0 loops=1)
>         Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
>         ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)
>               Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
>  Total runtime: 40.121 ms
>
> tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00109';
>                                                                    QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=786.63..788.06 rows=1 width=146) (actual time=73.018..73.018 rows=0 loops=1)
>   ->  Sort  (cost=786.63..787.34 rows=286 width=146) (actual time=73.016..73.016 rows=0 loops=1)
>         Sort Key: userdata
>         ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=72.940..72.940 rows=0 loops=1)
>               Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
>               ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)
>                     Index Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
>  Total runtime: 73.144 ms
>
> What gives?
>
Firstly, the 2 queries aren't equal.  They're matching against
different clientrmainid values.

Also, look at the bitmap index scan for each:

Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00
rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)

Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00
rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)

That's where the difference is.  An identical scan takes longer in one
than the other, either due to the index scan looking for different
values in each case, or at the time you were running it, another
process was using more resources.  You'd have to run these several
times to get an idea of average times.

Have you run ANALYZE on the table beforehand to make sure your stats
are up to date?

Regards

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux