"Broken" Plan? Nested Loop IN Join vs. Nested Loop/Hash Aggregate

"Worky Workerson" <worky.workerson@xxxxxxxxx> · Wed, 8 Nov 2006 11:35:39 -0500

I'm having an issue with a query plan that seems to be taking *far*
longer than it should.  I have the following schema (Note: I had to
retype everything from an isolated lab, so I'm hoping that its correct
enough to show my problem):

CREATE TABLE ip_profiles (
 ip    IP4 PRIMARY KEY,
 --more columns
);

CREATE TABLE events (
 ip    IP4 NOT NULL FOREIGN KEY ip_profiles (ip),
 content BYTEA,
 --more columns
);
CREATE INDEX events_ip_idx ON events USING btree (ip);

CREATE TABLE event_ip1 (
 CONSTRAINT partition_ip1 CHECK ((ip >= '1.0.0.0') AND (ip <=
'1.255.255.255'));
) INHERITS (events);

The events table is partitioned on the first octet of the ip, the same
indexes are generated across all the partitions, and constraint
exclusion is turned on.

I would like to get all the ip_profiles that have an event with a
given content string within an IP range.  My first try was the
following (again, retyped):

EXPLAIN ANALYZE SELECT count(*) FROM ip_profiles WHERE ip BETWEEN
'1.0.0.0' AND '1.255.255.255' AND ip IN (SELECT ip FROM events WHERE
ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'bytes');

Aggregate (cost=2047.15..2047.16 rows=1 width=0) (actual
time=7801833.540 rows=1 loops=1)
 ->  Nested Loop IN Join (cost=1.02..2047.15 rows=1 width=0) (actual
time=7801833.505..7801833.505 rows=0 loops=1)
       Join Filter: ("outer".ip = "inner".ip)
       -> Index Scan using ip_profiles_pkey on ip_profiles
(cost=0.00..3.37 rows=1 width=4) (actual time=0.035..1195.567
rows=144445 loops=1)
             Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255))
       -> Append (cost=1.02..2043.48 rows=24 width=4) (actual
time=53.988..53.988 rows=0 loops=144445)
             -> Bitmap Heap Scan on events (cost=1.02..4.96 rows=1
width=4) (actual time=0.010..0.010 rows=0 loops=144445)
                   Recheck Cond: ((ip > '1.0.0.0'::ip4) AND (ip <=
'1.255.255.255))
                   Filter: (content='bytes'::bytea)
                   -> Bitmap Index Scan on events_ip_idx
(cost=0.00..1.02 rows=3 width=0) (actual time=0.006..0.006 rows=0
loops=144445)
                         Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip
<= '1.255.255.255))
             -> Seq Scan on events_ip1 events (cost=0.00 rows=23
width=4) (actual time=53.972..53.972 rows=0 loops=144445)
                   Filter: ((ip > '1.0.0.0'::ip4) AND (ip <=
'1.255.255.255) AND (content='bytes'::bytea))
Total runtime: 7801834.104 ms

Which looks like the "Nested Loop IN Join) is taking a *lot* longer
than the planner estimated and, needless to say, is unusable.  Taking
out the range condition on the ip_profiles table should lead to a
query producing identical results, right?  I did this ...

SELECT count(*) FROM ip_profiles WHERE ip in (SELECT ip FROM events
WHERE ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'random
byte string');

... which results in a *mush* faster time, with a plan that looks like
(will retype fully if useful):

Nested Loop (cost=2187.94..2289.54 rows=30 width=67) (actual
time=223.681..225.693 rows=84 loops=1)
 -> Hash Aggregate
   -> Append
     -> Bitmap Heap Scan on events
     -> Seq Scan on events_ip1 events
 -> Index Scan using ip_profiles_pkey on ip_profiles
Total runtime: 226.170 ms

I realize that the immediate fix is "don't do that", but I'd like to
get to the bottom of this so that I'm not suprised in the future.  I
have seen several threads on the performance of IN (SELECT ..), but
this doesn't seem to be the culprit.  Any ideas?

Tables are recently vacuumed and analyzed, indexes were just created,
and I dropped and recreaetd the database with the same effect.  I am a
bit behind the latest, PG 8.1.3 on RHEL 4.3 x86_64, but I'd like to
know that upgrading will fix the problem before going through the
hassle of getting new stuff into the lab.  I plan on upgrading to 8.2
whenever that is "released".

Thanks!