Maybe the new parameter in v15 would help. https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR recursive_worktable_factor (floating point) Sets the planner's estimate of the average size of the working table of a recursive query, as a multiple of the estimated size of the initial non-recursive term of the query. This helps the planner choose the most appropriate method for joining the working table to the query's other tables. The default value is 10.0. A smaller value such as 1.0 can be helpful when the recursion has low “fan-out” from one step to the next, as for example in shortest-path queries. Graph analytics queries may benefit from larger-than-default values.
Thanks that's something I will try after I upgraded.
But speaking of such other uses for recursive queries, I can say
I have quite a bit of experience of turning graph related
"traversal" and search and optimization and classification queries
into SQL, in short, computing the transitive closure. And usually
I have stayed away from the recursive WITH query and instead set
up a start table and then perform the iterative step. And there
are two ways to go about it. Say you have a graph, simple nodes
and arcs. You want to find all paths through the graph.
Now you can set up start nodes and then extend them at the end by
joining the recursive table to the simple arc table and extend
your path every time. This is what the WITH RECURSIVE supports.
These queries linearly iterate as many times as the length of the
longest path.
with recursive arcs as ( select source, target, 1 as distance, ... from ... ), paths as ( select * from arcs union all select a.source, b.target, a.distance + 1 as distance, ... from paths a inner join arcs b on(b.source = a.target) ) select * from paths
But another way is to join paths with paths. It would be this, which I think I have seen postgresql unable to deal with:
with recursive arcs as ( select source, target, 1 as distance, ... from ... ), paths as ( select * from arcs union all select a.source, b.target, a.distance + 1 as distance, ... from paths a inner join paths b on(b.source = a.target) ) select * from paths
So, instead of the recursive union to join back to the fixed
table, it joins the recursive table to the recursive table, and
the benefit of that is that these queries converge much quicker.
Instead of going 10 iterations to find a path of length 10, you go
1 iteration to find all paths of 2 (if distance 1 is the base
table of all arcs), then next you find paths of up to 4 then you
find paths of up to 8, then 16, 32, ... This converges much
faster. I usually do that as follows
create table paths as select source, target, 1 as distance, ... from arcs; prepare rep as insert into paths(source, target, distance, ...) select a.source, b.target, a.distance + b.distance as distance, ... from paths a inner join paths b on(b.source = a.target) except select * from paths; execute rep; execute rep; ...
or instead of the except, in order to minimize distances:
where not exists (select 1 from paths x where x.source = a.source and x.target = a.target and x.distance < a.distance)
I have even done a group by in the recursive step which replaces
the paths relation at every iteration (e.g. with only minimal
distance paths).
Since this converges so rapidly I often prefer that approach over
a recursive union query.
I think in IBM DB2 allowed to join the recursive table with itself. Is this something you want to support at some time?
Also, why even use the RECURSIVE keyword, DB2 didn't need it, and the query analyzer should immediately see the recursion, so no need to have that keyword.
regards,
-Gunther