Re: When you really want to force a certain join type?

Gunther Schadow <raj@xxxxxxxx> · Thu, 29 Dec 2022 02:31:59 -0500

    On 12/28/2022 10:48 AM, Justin Pryzby
      wrote:

      Maybe the new parameter in v15 would help.

https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR
recursive_worktable_factor (floating point)

    Sets the planner's estimate of the average size of the working table
    of a recursive query, as a multiple of the estimated size of the
    initial non-recursive term of the query. This helps the planner
    choose the most appropriate method for joining the working table to
    the query's other tables. The default value is 10.0. A smaller value
    such as 1.0 can be helpful when the recursion has low “fan-out” from
    one step to the next, as for example in shortest-path queries. Graph
    analytics queries may benefit from larger-than-default values.

    Thanks that's something I will try after I upgraded.
    But speaking of such other uses for recursive queries, I can say
      I have quite a bit of experience of turning graph related
      "traversal" and search and optimization and classification queries
      into SQL, in short, computing the transitive closure. And usually
      I have stayed away from the recursive WITH query and instead set
      up a start table and then perform the iterative step. And there
      are two ways to go about it. Say you have a graph, simple nodes
      and arcs. You want to find all paths through the graph. 

    Now you can set up start nodes and then extend them at the end by
      joining the recursive table to the simple arc table and extend
      your path every time. This is what the WITH RECURSIVE supports.
      These queries linearly iterate as many times as the length of the
      longest path. 

    with recursive arcs as (
  select source, target, 1 as distance, ...
    from ...
), paths as (
  select * from arcs
  union all
  select a.source, b.target, a.distance + 1 as distance, ...
    from paths a inner join arcs b
      on(b.source = a.target)
)
select * from paths

    But another way is to join paths with paths. It would be this,
      which I think I have seen postgresql unable to deal with:
    with recursive arcs as (
  select source, target, 1 as distance, ...
    from ...
), paths as (
  select * from arcs
  union all
  select a.source, b.target, a.distance + 1 as distance, ...
    from paths a inner join paths b
      on(b.source = a.target)
)
select * from paths

    So, instead of the recursive union to join back to the fixed
      table, it joins the recursive table to the recursive table, and
      the benefit of that is that these queries converge much quicker.
      Instead of going 10 iterations to find a path of length 10, you go
      1 iteration to find all paths of 2 (if distance 1 is the base
      table of all arcs), then next you find paths of up to 4 then you
      find paths of up to 8, then 16, 32, ... This converges much
      faster. I usually do that as follows

    create table paths as
select source, target, 1 as distance, ...
  from arcs;

prepare rep as
insert into paths(source, target, distance, ...)
select a.source, b.target, a.distance + b.distance as distance, ... 
  from paths a inner join paths b on(b.source = a.target) 
except
select * from paths;

execute rep;
execute rep;
...

    or instead of the except, in order to minimize distances:
    where not exists (select 1 from paths x 
                   where x.source = a.source 
                     and x.target = a.target
                     and x.distance < a.distance)

    I have even done a group by in the recursive step which replaces
      the paths relation at every iteration (e.g. with only minimal
      distance paths).

    Since this converges so rapidly I often prefer that approach over
      a recursive union query. 

    I think in IBM DB2 allowed to join the recursive table with
      itself. Is this something you want to support at some time?
    Also, why even use the RECURSIVE keyword, DB2 didn't need it, and
      the query analyzer should immediately see the recursion, so no
      need to have that keyword.
    regards,

      -Gunther