Re: any hope for my big query?

Edoardo Ceccarelli <eddy@xxxxxx> · Fri, 29 Sep 2006 11:19:19 +0200

You have 2 seqscans on albumjoin table, you first make a simple join:

...and albumjoin.album = public.album.id ...

that generates the first
->  Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=8)
and then you group values from same table counting them with

... (select album from albumjoin group by album having count(*) between 
15 and 25)  ...

that generates the second

Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)

given the complexity of the query, maybe you could create an 
intermediate table with only one seqscan and use that one in final query 
but I don't know if that's possible with the db structure you have

Can I ask what exactly is albumjoin table? is it a n-n relation?

explain select
    distinct public.album.id
from
    public.album,public.albumjoin,public.track,umdb.node
where
    node.dir=2811
    and albumjoin.album = public.album.id
    and public.albumjoin.track = public.track.id
    and levenshtein(substring(public.track.name for 75),
        substring(node.file for 75)) <= 10
    and public.album.id in
        (select album from albumjoin group by album having count(*) 
between 15 and 25) group by public.album.id
having count(*) >= 5;

 Unique  (cost=991430.53..1013711.74 rows=425772 width=4)
   ->  GroupAggregate  (cost=991430.53..1012647.31 rows=425772 width=4)
         Filter: (count(*) >= 5)
         ->  Sort  (cost=991430.53..996373.93 rows=1977360 width=4)
               Sort Key: album.id
               ->  Nested Loop  (cost=513549.06..737866.68 
rows=1977360 width=4)
                     Join Filter: 
(levenshtein("substring"(("inner".name)::text, 1, 75), 
"substring"("outer".file, 1, 75)) <= 10)
                     ->  Index Scan using node_dir on node  
(cost=0.00..3.22 rows=16 width=40)
                           Index Cond: (dir = 2811)
                     ->  Materialize  (cost=513549.06..520153.61 
rows=370755 width=25)
                           ->  Hash Join  (cost=271464.72..510281.31 
rows=370755 width=25)
                                 Hash Cond: ("outer".id = "inner".track)
                                 ->  Seq Scan on track  
(cost=0.00..127872.69 rows=5111469 width=25)
                                 ->  Hash  (cost=268726.83..268726.83 
rows=370755 width=8)
                                       ->  Hash Join  
(cost=150840.51..268726.83 rows=370755 width=8)
                                             Hash Cond: ("outer".album 
= "inner".id)
                                             ->  Seq Scan on 
albumjoin  (cost=0.00..88642.18 rows=5107318 width=8)
                                             ->  Hash  
(cost=150763.24..150763.24 rows=30908 width=8)
                                                   ->  Hash Join  
(cost=127951.57..150763.24 rows=30908 width=8)
                                                         Hash Cond: 
("outer".id = "inner".album)
                                                         ->  Seq Scan 
on album  (cost=0.00..12922.72 rows=425772 width=4)
                                                         ->  Hash  
(cost=127874.30..127874.30 rows=30908 width=4)
                                                               ->  
HashAggregate  (cost=126947.06..127565.22 rows=30908 width=4)

Filter: ((count(*) >= 15) AND (count(*) <= 25))

->  Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)

I've tried adding a length(public.track.name) index and filtering 
public.track to those rows where length(name) is within a few 
characters of node.file, but that actually makes the plan more expensive.

Is there any hope to make things much cheaper? Unfortunately, I can't 
filter out anything from the album or albumjoin tables.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly