Re: any hope for my big query?

"Jim C. Nasby" <jim@xxxxxxxxx> · Fri, 29 Sep 2006 10:35:57 -0500

There's no join criteria for umdb.node... is that really what you want?

On Thu, Sep 28, 2006 at 03:18:56PM -0700, Ben wrote:
> Hey guys, I've got a query that is inherently expensive, because it has to 
> do some joins against some large tables. But it's currently *very* 
> expensive (at least for a web app), and I've been struggling in vain all 
> day to knock the cost down. Annoyingly, the least costly version I've come 
> up with remains my first attempt, and is the most straight-forward:
> 
> explain select
> 	distinct public.album.id
> from
> 	public.album,public.albumjoin,public.track,umdb.node
> where
> 	node.dir=2811
> 	and albumjoin.album = public.album.id
> 	and public.albumjoin.track = public.track.id
> 	and levenshtein(substring(public.track.name for 75),
> 		substring(node.file for 75)) <= 10
> 	and public.album.id in
> 		(select album from albumjoin group by album having count(*) 
> 		between 15 and 25) group by public.album.id
> having count(*) >= 5;
> 
> 
>  Unique  (cost=991430.53..1013711.74 rows=425772 width=4)
>    ->  GroupAggregate  (cost=991430.53..1012647.31 rows=425772 width=4)
>          Filter: (count(*) >= 5)
>          ->  Sort  (cost=991430.53..996373.93 rows=1977360 width=4)
>                Sort Key: album.id
>                ->  Nested Loop  (cost=513549.06..737866.68 rows=1977360 
>                width=4)
>                      Join Filter: 
>                      (levenshtein("substring"(("inner".name)::text, 1, 75), 
>                      "substring"("outer".file, 1, 75)) <= 10)
>                      ->  Index Scan using node_dir on node  
>                      (cost=0.00..3.22 rows=16 width=40)
>                            Index Cond: (dir = 2811)
>                      ->  Materialize  (cost=513549.06..520153.61 
>                      rows=370755 width=25)
>                            ->  Hash Join  (cost=271464.72..510281.31 
>                            rows=370755 width=25)
>                                  Hash Cond: ("outer".id = "inner".track)
>                                  ->  Seq Scan on track  
>                                  (cost=0.00..127872.69 rows=5111469 
>                                  width=25)
>                                  ->  Hash  (cost=268726.83..268726.83 
>                                  rows=370755 width=8)
>                                        ->  Hash Join  
>                                        (cost=150840.51..268726.83 
>                                        rows=370755 width=8)
>                                              Hash Cond: ("outer".album = 
>                                              "inner".id)
>                                              ->  Seq Scan on albumjoin  
>                                              (cost=0.00..88642.18 
>                                              rows=5107318 width=8)
>                                              ->  Hash  
>                                              (cost=150763.24..150763.24 
>                                              rows=30908 width=8)
>                                                    ->  Hash Join  
>                                                    (cost=127951.57..150763.24 rows=30908 width=8)
>                                                          Hash Cond: 
>                                                          ("outer".id = 
>                                                          "inner".album)
>                                                          ->  Seq Scan on 
>                                                          album  
>                                                          (cost=0.00..12922.72 rows=425772 width=4)
>                                                          ->  Hash  
>                                                          (cost=127874.30..127874.30 rows=30908 width=4)
>                                                                ->  
>                                                                HashAggregate  (cost=126947.06..127565.22 rows=30908 width=4)
>                                                                      Filter: ((count(*) >= 15) AND (count(*) <= 25))
>                                                                      ->  
>                                                                      Seq 
>                                                                      Scan 
>                                                                      on 
>                                                                      albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)
> 
> 
> I've tried adding a length(public.track.name) index and filtering 
> public.track to those rows where length(name) is within a few characters 
> of node.file, but that actually makes the plan more expensive.
> 
> Is there any hope to make things much cheaper? Unfortunately, I can't 
> filter out anything from the album or albumjoin tables.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
>       message can get through to the mailing list cleanly
> 

-- 
Jim Nasby                                            jim@xxxxxxxxx
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)