Search Postgresql Archives

Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom,

Thank you for your prompt reply. Your advice has pointed me in the right direction.

I now have the wrapper identifying columns that are inputs to the web service, and thus parameterisable. The ec_classes, left_join_clauses and right_join_clauses trees are scanned for Var exprs that match these attributes. If they are present, the relid is added to the required list of outer rels for the path -- this is done as an extension to the logic I posted previously.

In all cases this seems to work, except one. A join between 3 tables. The foreign table has 2 parameterised columns, each given a restriction based on one of the other two tables:

    adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a;
                                                 QUERY PLAN
    ----------------------------------------------------------------------------------------------------
     Merge Join  (cost=5000704.96..5001278.44 rows=37822 width=168)
       Merge Cond: (l2.a = foreign1.b)
       ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
             Sort Key: l2.a
             ->  Seq Scan on l2  (cost=0.00..22.30 rows=1230 width=36)
       ->  Sort  (cost=5000619.54..5000634.91 rows=6150 width=132)
             Sort Key: foreign1.b
             ->  Merge Join  (cost=5000135.26..5000232.51 rows=6150 width=132)
                   Merge Cond: (foreign1.a = l1.a)
                   ->  Sort  (cost=5000049.83..5000052.33 rows=1000 width=96)
                         Sort Key: foreign1.a
                         ->  Foreign Scan on foreign1  (cost=5000000.00..5000000.00 rows=1000 width=96)
                   ->  Sort  (cost=85.43..88.50 rows=1230 width=36)
                         Sort Key: l1.a
                         ->  Seq Scan on l1  (cost=0.00..22.30 rows=1230 width=36)

My path generation logic seems to work:

baserel->cheapest_parameterized_paths = (
   {FOREIGNPATH
   :pathtype 120
   :parent_relids (b 3)
   :required_outer (b 1 2)
   :rows 500
   :startup_cost 0.00
   :total_cost 0.00
   :pathkeys <>
   :fdw_private <>
   }
   {FOREIGNPATH
   :pathtype 120
   :parent_relids (b 3)
   :required_outer (b)
   :rows 1000
   :startup_cost 5000000.00
   :total_cost 5000000.00
   :pathkeys <>
   :fdw_private <>
   }
)

Yet the planner picks the non-parameterised path:

ForeignPath* best_path = {FOREIGNPATH
   :pathtype 120
   :parent_relids (b 3)
   :required_outer (b)
   :rows 1000
   :startup_cost 5000000.00
   :total_cost 5000000.00
   :pathkeys <>
   :fdw_private <>
   }

I’ve tried adjusting planner tuneables to disable all join types except nested loop, and setting `join_collapse_limit` to 1 with no desirable outcome.

Yet, adding a restriction clause between the other two tables forces them to be scanned first:

    adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a and l1.b > l2.b;
                                   QUERY PLAN
    -------------------------------------------------------------------------
     Nested Loop  (cost=0.00..2544241.17 rows=12608 width=168)
       ->  Nested Loop  (cost=0.00..22741.17 rows=504300 width=72)
             Join Filter: (l1.b > l2.b)
             ->  Seq Scan on l1  (cost=0.00..22.30 rows=1230 width=36)
             ->  Materialize  (cost=0.00..28.45 rows=1230 width=36)
                   ->  Seq Scan on l2  (cost=0.00..22.30 rows=1230 width=36)
       ->  Foreign Scan on foreign1  (cost=0.00..0.00 rows=500 width=96)
             Filter: ((a = l1.a) AND (b = l2.a))


ForeignPath* best_path = {FOREIGNPATH
   :pathtype 120
   :parent_relids (b 3)
   :required_outer (b 1 2)
   :rows 500
   :startup_cost 0.00
   :total_cost 0.00
   :pathkeys <>
   :fdw_private <>
   }


On 18/03/2013, at 4:09 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> Adam Zegelin <adam@xxxxxxxxxxxxx> writes:
>> Some service endpoints have no concept of unqualified queries. In the example above, a ‘sequence scan’ of Bing is a not possible.
> 
> In that case, you shouldn't be generating such a path.  But keep in mind
> that this may lead to failure to produce any plan at all for some
> queries.  If the foreign data source is really so broken that it can't
> do that, then you have little choice ... but you shouldn't be thinking
> of that as anything but a broken design decision on their part.

I tried adding a condition that would prevent the non-parameterised path from being generated if the service only supported parameterised scans. Postgres refuses to generate a plan: "ERROR:  could not devise a query plan for the given query". I did a bit of digging and this error is generated by pathnode.c:set_cheapest . As there is no non-parameterised `cheapest_total_path` the error is raised (line 253).

For now, I just add an expensive non-pramerterised path and let the FDW throw an error if no qual is found involving the required columns.

Regards,
Adam
  

<<attachment: smime.p7s>>


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux