Re: Very slow joins

MS <fretka1990@xxxxxxxxx> · Mon, 27 Jul 2009 04:20:39 -0700 (PDT)

> What first post? The only thing I can find is a reference in a message  
> by you from yesterday, to a two-year old post that you claim is about  
> the same problem. Though it's possible that it is the same problem,  
> you don't provide any data to back that up.

Strange - you can see the full thread here:
http://groups.google.pl/group/pgsql.general/browse_thread/thread/6c4ea8356327276c
I post via Google, maybe that's the cause?

> The message you referred to was about a one-of-a-kind problem with  
> communications to the client and had nothing to do with performance on  
> the server; is that indeed what you're seeing? In that case you should  
> check your network infrastructure for problems.

No, I actually meant that the CPU usage was low during my query.
On the other hand IO wait was very high so the low CPU usage was kind
of normal.

> Usually server performance problems are due to problems with tuning  
> parameters or outdated statistics. Those issues can usually be solved  
> easily.

Well, maybe - I started to use postgres only recently, so maybe I'm
doing some obvious
mistakes. My database was filled incrementally by a shell script - I
don't know if that alone lets
postgres collect all necessary stats. Maybe an implicit analyze is
necessary?

I also tried to vacuum/vacuum full my tables before running my queries
but it took too long so I had to break it.

> Without posting an EXPLAIN ANALYSE people here can only guess what  
> your problem is.

I'm posting another "explain analyze" below.
I've run my query with "explain analyze", but forgot to save it :( I
think I won't be able to run my queries again, because
they took around 30-60 minutes and almost killed my server (was almost
completely unresponsive during
the query, because of 90% IO wait).

> > BUT I found the real cause of my problem - the "fk2" field from my
> > example had not only an index, but it was also a foreign key to
> > another table.
> That seems unlikely to be the cause. From the above it seems much more  
> likely that you're suffering from a bad query plan instead, but you  
> don't provide any details.

I just tried to create a test with similar data - here is what it
looks like:
(it should work if you just paste it in some Test db)

-- ------------- generate test tables + data

drop table if exists article, keyword, article_keyword, tochange, sums
cascade;

CREATE TABLE "article" (
    "id" serial NOT NULL PRIMARY KEY,
    "content" varchar(255) NULL,
    "ip" inet NULL,
    "has_comments" bool not null
)
;

CREATE TABLE "keyword" (
    "id" serial NOT NULL PRIMARY KEY,
    "keyword" varchar(40) NOT NULL UNIQUE,
    "articles" integer NOT NULL
)
;
CREATE TABLE "article_keyword" (
    "id" serial NOT NULL PRIMARY KEY,
    "article_id" integer NOT NULL REFERENCES "article" ("id")
DEFERRABLE INITIALLY DEFERRED,
    "keyword_id" integer NOT NULL REFERENCES "keyword" ("id")
DEFERRABLE INITIALLY DEFERRED,
    "votes_yes" integer NOT NULL,
    "votes_no" integer NOT NULL
)
;
CREATE INDEX "article_keyword_keyword_id" ON
"article_keyword" ("keyword_id");

insert into article(content, ip, has_comments) values ('some article',
'123.121.121.223', true);

insert into keyword
select nextval('keyword_id_seq'), md5(to_char(i, '9999999999999')), 0
from generate_series(1,2000000) as i;

insert into article_keyword
select nextval('article_keyword_id_seq'), 1, k.id, 0, 0 from
generate_series(1,2000000) as i
join keyword k on k.keyword=md5(to_char(i, '9999999999999'))
join generate_series(1,5) as times on true
;

create table tochange (
	fromid int not null primary key,
	toid int not null
);

insert into tochange
select k1.id, k2.id from
generate_series(1,200000) as i
join keyword k1 on k1.keyword=md5(to_char(i, '9999999999999'))
join keyword k2 on k2.keyword=md5(to_char(i+200000, '9999999999999'))
;

create table sums (
	id int not null primary key,
	sum int
);

-- ----------------- now my queries:

-- replace fromid's with toid's

update article_keyword
set keyword_id=tc.toid
from tochange tc
where
keyword_id=tc.fromid
;

-- delete unused keywords
delete from article_keyword
where id in (
select k.id
from keyword k
left join article_keyword ak on k.id=ak.keyword_id
where ak.keyword_id is null
)
;

-- recalculate sums - in how many articles is a keyword used?
insert into sums
select keyword_id, count(*)
from article_keyword
group by keyword_id;

update keyword k
set articles=s.sum
from
sums s
where
k.id=s.id;

----------------------

The problem is that I can't reproduce this slow behaviour with this
test case. :(
The tables are almost identical - only the article table is bigger in
reality (it has around million rows)

When I run "explain update" (first update from the test case) it
prints this now:

 Merge Join  (cost=5.14..53436.13 rows=3636710 width=26)
   Merge Cond: (tc.fromid = article_keyword.keyword_id)
   ->  Index Scan using tochange_pkey on tochange tc
(cost=0.00..2830.26 rows=100000 width=8)
   ->  Index Scan using article_keyword_keyword_id on article_keyword
(cost=0.00..148216.29 rows=5000040 width=26)
(4 rows)

When I disable enable_mergejoin I have this plan:

 Hash Join  (cost=6160.91..274121.21 rows=5500010 width=26)
   Hash Cond: (article_keyword.keyword_id = tc.fromid)
   ->  Seq Scan on article_keyword  (cost=0.00..87353.10 rows=5500010
width=26)
   ->  Hash  (cost=2882.74..2882.74 rows=199774 width=8)
         ->  Seq Scan on tochange tc  (cost=0.00..2882.74 rows=199774
width=8)
(5 rows)

This is the plan I was getting with my original query which took so
long.

Also the second Update was very slow. Deletes, and inserts were quite
fast.

> It would also help to know what version of PostgreSQL this is and on  
> what hardware and setup you're running into this issue.

I tried both postgress 8.3, and 8.4. Now I use 8.4.

I have a standard config + pgtune which added the following entries:
default_statistics_target = 50
maintenance_work_mem = 28MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 352MB
work_mem = 2816kB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 112MB
max_connections = 80

The server is  Intel(R) Core(TM)2 CPU  E8400 @3.00GHz, 4GB ram, 2x
SATA disks in Raid1

Thanks,
MS

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general