Re: Join Query Perfomance Issue

Thomas Zaksek <zaksek@xxxxxxxxxxxxxx> · Tue, 12 Feb 2008 11:11:11 +0100

Scott Marlowe schrieb:
On Feb 11, 2008 12:08 PM, Thomas Zaksek <zaksek@xxxxxxxxxxxxxx> wrote:

I have serious performance problems with the following type of queries:
/
/explain analyse SELECT '12.11.2007 18:04:00 UTC' AS zeit,
                       'M' AS datatyp,
                       p.zs_nr AS zs_de,
                   j_ges,
                       de_mw_abh_j_lkw(mw_abh) AS j_lkw,
                       de_mw_abh_v_pkw(mw_abh) AS v_pkw,
                       de_mw_abh_v_lkw(mw_abh) AS v_lkw,
                   de_mw_abh_p_bel(mw_abh) AS p_bel
                   FROM  messpunkt p, messungen_v_dat_2007_11_12 m, de_mw w
                       WHERE  m.ganglinientyp = 'M'
               AND 381 = m.minute_tag
                       AND (p.nr, p.mw_nr) = (m.messpunkt, w.nr);

Explain analze returns

 Nested Loop  (cost=0.00..50389.39 rows=3009 width=10) (actual
time=0.503..320.872 rows=2189 loops=1)
   ->  Nested Loop  (cost=0.00..30668.61 rows=3009 width=8) (actual
time=0.254..94.116 rows=2189 loops=1)

This nested loop is using us most of your time.  Try increasing
work_mem and see if it chooses a better join plan, and / or turn off
nested loops for a moment and see if that helps.

set enable_nestloop = off

Note that set enable_xxx = off

Is kind of a hammer to the forebrain setting.  It's not subtle, and
the planner can't work around it.  So use them with caution.  That
said, I had one reporting query that simply wouldn't run fast without
turning off nested loops for that one.  But don't turn off nested
queries universally, they are still a good choice for smaller amounts
of data.

I tried turning off nestloop, but with terrible results:

Hash Join  (cost=208328.61..228555.14 rows=3050 width=10) (actual 
time=33421.071..40362.136 rows=2920 loops=1)
  Hash Cond: (w.nr = p.mw_nr)
  ->  Seq Scan on de_mw w  (cost=0.00..14593.79 rows=891479 width=10) 
(actual time=0.012..3379.971 rows=891479 loops=1)
  ->  Hash  (cost=208290.49..208290.49 rows=3050 width=8) (actual 
time=33420.877..33420.877 rows=2920 loops=1)
        ->  Merge Join  (cost=5303.71..208290.49 rows=3050 width=8) 
(actual time=31.550..33407.688 rows=2920 loops=1)
              Merge Cond: (p.nr = m.messpunkt)
              ->  Index Scan using messpunkt_nr_idx on messpunkt p  
(cost=0.00..238879.39 rows=6306026 width=12) (actual 
time=0.056..17209.317 rows=4339470 loops=1)
              ->  Sort  (cost=5303.71..5311.34 rows=3050 width=4) 
(actual time=25.973..36.858 rows=2920 loops=1)
                    Sort Key: m.messpunkt
                    ->  Index Scan using 
messungen_v_dat_2007_11_12_messpunkt_minute_tag_idx on 
messungen_v_dat_2007_11_12 m  (cost=0.00..5127.20 rows=3050 width=4) 
(actual time=0.124..12.822 rows=2920 loops=1)
                          Index Cond: ((ganglinientyp = 'M'::bpchar) 
AND (651 = minute_tag))
Total runtime: 40373.512 ms
(12 rows)
Looks crappy, isn't it?

I also tried to increase work_men, now the config is
work_mem = 4MB                         
maintenance_work_mem = 128MB,
in regard to performance, it wasnt effective at all.

The postgresql runs  on a HP Server with dual Opteron, 3GB of Ram, what 
are good settings here? The database will have to work with tables of 
several 10Millions of Lines, but only a few columns each. No more than 
maybe ~5 clients accessing the database at the same time.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly