Simple join optimized badly?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort.

 HITLIST_ROWS has about 48,000 rows
 SAMPLE has about 16 million rows

 The joined column is indexed in SAMPLE
 HITLIST_ROWS is a scratch table which is used a few times then discarded.
 HITLIST_ROWS has no indexes at all

There are two plans below.  The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it's sorting the 16 million rows of the SEARCH table.  Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent.

First question: HITLIST_ROWS so small, I don't understand why the lack of ANALYZE should cause SAMPLE's contents to be sorted.

Second question: Even though ANALYZE brings it down from 26 minutes to 47 seconds, a huge improvement, it still seems slow to me.  Its going at roughly 1 row per millisecond -- are my expectations too high?  This is a small-ish Dell computer (Xeon), 4 GB memory, with a four-disk SATA software RAID0 (bandwidth limited to about 130 MB/sec due to PCI cards).  Other joins of a similar size seem much faster.

It looks like I'll need to do an ANALYZE every time I modify HITLIST_ROWS, which seems like a waste because HITLIST_ROWS is rarely used more than once or twice before being truncated and rebuilt with new content.  (HITLIST_ROWS can't be an actual temporary table, though, because it's a web application and each access is from a new connection.)

This is Postgres 8.0.3.  (We're upgrading soon.)

Thanks,
Craig



explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID);
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=4782.35..1063809.82 rows=613226 width=4) (actual time=174.212..1593886.582 rows=176294 loops=1)
  Merge Cond: ("outer".version_id = "inner".objectid)
  ->  Index Scan using i_sample_version_id on sample t  (cost=0.00..1008713.68 rows=16446157 width=8) (actual time=0.111..1571911.208 rows=16446157 loops=1)
  ->  Sort  (cost=4782.35..4910.39 rows=51216 width=4) (actual time=173.669..389.496 rows=176329 loops=1)
        Sort Key: ph.objectid
        ->  Seq Scan on hitlist_rows_378593 ph  (cost=0.00..776.16 rows=51216 width=4) (actual time=0.015..90.059 rows=48834 loops=1)
Total runtime: 1594093.725 ms
(7 rows)

chmoogle2=> analyze HITLIST_ROWS;
ANALYZE
chmoogle2=> explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID;
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=874.43..457976.83 rows=584705 width=4) (actual time=302.792..47796.719 rows=176294 loops=1)
  Hash Cond: ("outer".version_id = "inner".objectid)
  ->  Seq Scan on sample t  (cost=0.00..369024.57 rows=16446157 width=8) (actual time=46.344..26752.343 rows=16446157 loops=1)
  ->  Hash  (cost=752.34..752.34 rows=48834 width=4) (actual time=149.548..149.548 rows=0 loops=1)
        ->  Seq Scan on hitlist_rows_378593 ph  (cost=0.00..752.34 rows=48834 width=4) (actual time=0.048..80.721 rows=48834 loops=1)
Total runtime: 47988.572 ms
(6 rows)


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux