Re: Very specialised query

Matthew Wakeling <matthew@xxxxxxxxxxx> · Fri, 27 Mar 2009 17:34:26 +0000 (GMT)

On Fri, 27 Mar 2009, Віталій Тимчишин wrote:
...an index on (objectid, start) would help...

Definitely.

You could try  adding    "AND l2.start > l1.start" to the first query.  
This will drop symmetric half of intersections (the ones that will 
remain are l2 inside or to the left of l1), but you can redo results by 
id1,id2 union all id2, id1 and may allow to use start index for 
"between"

That's remarkably clever, and should have been obvious. Thanks.

Adding the constraint as you say does indeed make the query fast. However, 
there seems to be a problem with the planner, in that it does not 
distinguish between the costs of two alternative plans, which have vastly 
different real costs. Consider the following:

SELECT
    l1.id AS id1,
    l2.id AS id2
FROM
    location l1,
    location l2
WHERE
        l1.objectid = 228000093
    AND l2.objectid = 228000093
    AND l1.id <> l2.id
    AND l1.start < l2.end
    AND l1.end > l2.start
    AND l1.start < l2.start
UNION ALL 
SELECT
    l1.id AS id1, l2.id AS id2
FROM
    location l1,
    location l2
WHERE
        l1.objectid = 228000093
    AND l2.objectid = 228000093
    AND l1.id <> l2.id
    AND l1.start < l2.end
    AND l1.end > l2.start
    AND l1.start >= l2.start;
                                                                    QUERY 
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..20479179.74 rows=138732882 width=8)
   ->  Nested Loop  (cost=0.00..9545925.46 rows=69366441 width=8)
         Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end))
         ->  Index Scan using location_test_obj_end on location l1  (cost=0.00..55966.07 rows=43277 width=12)
               Index Cond: (objectid = 228000093)
         ->  Index Scan using location_test_obj_start on location l2  (cost=0.00..123.10 rows=4809 width=12)
               Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start < l2.start))
   ->  Nested Loop  (cost=0.00..9545925.46 rows=69366441 width=8)
         Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end))
         ->  Index Scan using location_test_obj_end on location l1  (cost=0.00..55966.07 rows=43277 width=12)
               Index Cond: (objectid = 228000093)
         ->  Index Scan using location_test_obj_start on location l2  (cost=0.00..123.10 rows=4809 width=12)
               Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start >= l2.start))
(13 rows)

Notice the two different index conditions:
    (l1.end > l2.start) AND (l1.start < l2.start)  - "between"
    (l1.end > l2.start) AND (l1.start >= l2.start) - open-ended
Both have a cost of (cost=0.00..123.10 rows=4809 width=12)

Postgres estimates these two index scans to be equivalent in cost, where 
they are actually vastly different in real cost. Shouldn't Postgres favour 
a "between" index scan over an open-ended one?

Matthew

--
[About NP-completeness] These are the problems that make efficient use of
the Fairy Godmother.                    -- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance