Re: fast DISTINCT or EXIST

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

On Sat, 07 Apr 2007 11:54:08 -0400
Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> Arjen van der Meijden <acmmailing@xxxxxxxxxxxx> writes:
> > If that is your main culprit, you could also use two limits based on the 
> > fact that there will be at most X songs per cd which would match your 
> > title (my not very educated guess is 3x). Its a bit ugly... but if that 
> > is what it takes to make postgresql not scan your entire index, so be it...
> 
> > SELECT ... FROM cd
> >    JOIN tracks ...
> > WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
> >       WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30) 
> > as foo LIMIT 10)
> 
> I think that's the only way.  There is no plan type in Postgres that
> will generate unique-ified output without scanning the whole input
> first, except for Uniq on pre-sorted input, which we can't use here
> because the tsearch scan isn't going to deliver the rows in cd_id order.

Unfortunately, the query above will definitely not work correctly, if
someone searches for "a" or "the". 

The correct query does not perform as well as I hoped. 

#v+
cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.artist,cd.title,tracks.title FROM cd JOIN tracks USING (cd_id) WHERE cd_id IN (SELECT DISTINCT tracks.cd_id FROM tracks WHERE tracks.tstitle @@ plainto_tsquery('simple','sympathy') LIMIT 10);
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=61031.41..64906.58 rows=139 width=69) (actual time=31236.562..31810.940 rows=166 loops=1)
   ->  Nested Loop  (cost=61031.41..61176.20 rows=10 width=50) (actual time=31208.649..31388.289 rows=10 loops=1)
         ->  Limit  (cost=61031.41..61089.74 rows=10 width=4) (actual time=31185.972..31186.024 rows=10 loops=1)
               ->  Unique  (cost=61031.41..61124.74 rows=16 width=4) (actual time=31185.967..31186.006 rows=10 loops=1)
                     ->  Sort  (cost=61031.41..61078.07 rows=18665 width=4) (actual time=31185.961..31185.977 rows=11 loops=1)
                           Sort Key: public.tracks.cd_id
                           ->  Bitmap Heap Scan on tracks  (cost=536.76..59707.31 rows=18665 width=4) (actual time=146.222..30958.057 rows=1677 loops=1)
                                 Recheck Cond: (tstitle @@ '''sympathy'''::tsquery)
                                 ->  Bitmap Index Scan on tstitle_tracks_idx  (cost=0.00..532.09 rows=18665 width=0) (actual time=126.328..126.328 rows=1677 loops=1)
                                       Index Cond: (tstitle @@ '''sympathy'''::tsquery)
         ->  Index Scan using cd_id_key on cd  (cost=0.00..8.62 rows=1 width=46) (actual time=20.218..20.219 rows=1 loops=10)
               Index Cond: (cd.cd_id = "IN_subquery".cd_id)
   ->  Index Scan using cdid_tracks_idx on tracks  (cost=0.00..358.08 rows=1197 width=27) (actual time=39.935..42.247 rows=17 loops=10)
         Index Cond: (cd.cd_id = public.tracks.cd_id)
 Total runtime: 31811.256 ms
(15 rows)
#v-

It gets better when the rows are in memory (down to 10.452 ms), but
Murphy tells me, that the content that I need will never be in memory.

I think I disregarded this variant at first, because it limits the
possibility to restrict the cd artist and title.

> I can see how to build one: make a variant of HashAggregate that returns
> each input row immediately after hashing it, *if* it isn't a duplicate
> of one already in the hash table.  But it'd be a lot of work for what
> seems a rather specialized need.

D'oh.

Actually, I hoped to find an alternative, that does not involve
DISTINCT.

Best Regards,

Tilo


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux