Hello,
Here is the setup:
I have a query plan optimization question. It is formatted nicely on
But here is a copy for the archive:
Here is the setup:
CREATE EXTENSION postgis;
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);
CREATE INDEX ON B USING GIST (shape);
I am running the following commands:
I am running the following commands:
ANALYZE A;
ANALYZE B;
-- for each row in A, select exactly one row in B (if there is one)
-- such that B contains geometry of A
EXPLAIN
SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
ST_Contains(B.shape, A.shape)
AND B.kind != 1 LIMIT 1) as bid FROM A) AS TMP;
which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
Note that there is a sequential scan inside the lateral join, however there is clearly an index available. However after setting
set enable_seqscan=false;
the index is being used. This actually affects runtime significantly (around 3 times faster) and seems that postgres should figure things like that automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } ] } }
Is there any way to tell postgres to use index in a less hacky way? Possibly by rewriting the query? From what I understand the use of set enable_... is not recommended in production.
When you actually run the commands above it will give
which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }
Note that there is a sequential scan inside the lateral join, however there is clearly an index available. However after setting
set enable_seqscan=false;
the index is being used. This actually affects runtime significantly (around 3 times faster) and seems that postgres should figure things like that automatically. { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", "Startup Cost": 10000000000.00, "Total Cost": 10004716493.85, "Plan Rows": 549450, "Plan Width": 1677, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Index Scan", "Parent Relationship": "Outer", "Scan Direction": "NoMovement", "Index Name": "B_shape_idx", "Relation Name": "B", "Startup Cost": 0.00, "Total Cost": 8.52, "Plan Rows": 1, "Plan Width": 8, "Index Cond": "(shape && A.shape)", "Filter": "_st_contains(shape, A.shape)" } ] } ] } }
Is there any way to tell postgres to use index in a less hacky way? Possibly by rewriting the query? From what I understand the use of set enable_... is not recommended in production.
When you actually run the commands above it will give
{
"Plan": {
"Node Type": "Seq Scan",
"Relation Name": "a",
"Alias": "a",
"Startup Cost": 0.00,
"Total Cost": 10372.75,
"Plan Rows": 1230,
"Plan Width": 36,
"Plans": [
{
"Node Type": "Limit",
"Parent Relationship": "SubPlan",
"Subplan Name": "SubPlan 1",
"Startup Cost": 0.14,
"Total Cost": 8.41,
"Plan Rows": 1,
"Plan Width": 4,
"Plans": [
{
"Node Type": "Index Scan",
"Parent Relationship": "Outer",
"Scan Direction": "NoMovement",
"Index Name": "b_shape_idx",
"Relation Name": "b",
"Alias": "b",
"Startup Cost": 0.14,
"Total Cost": 8.41,
"Plan Rows": 1,
"Plan Width": 4,
"Index Cond": "(shape && a.shape)",
"Filter": "((kind <> 1) AND _st_contains(shape, a.shape))"
}
]
}
]
}
}
Unfortunately I cannot provide data to reproduce the query plan results.
Unfortunately I cannot provide data to reproduce the query plan results.
Thanks,
Igor