Re: When you really want to force a certain join type?

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Wed, 28 Dec 2022 09:48:37 -0600

On Wed, Dec 28, 2022 at 10:39:14AM -0500, Gunther Schadow wrote:
> I have a complex query which essentially runs a finite state automaton
> through a with recursive union, adding the next state based on the
> previous.  This is run at 100,000 or a million start states at the same
> time, picking a new record (token), matching it to the FSA (a three-way
> join:

> There are 100s of thousands of states. This join has a HUGE fan out if it is

> I doubt that I can find any trick to give to the planner better data which
> it can then use to figure out that the merge join is a bad proposition.

> Note, for my immediate relief I have forced it by simply set
> enable_mergejoin=off. This works fine, except, it converts both into a
> nested loop, but the upper merge join was not a problem, and sometimes (most
> often) nested loop is a bad choice for bulk data. It's only for this
> recursive query it sometimes makes sense.

Maybe the new parameter in v15 would help.

https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR
recursive_worktable_factor (floating point)

    Sets the planner's estimate of the average size of the working table
    of a recursive query, as a multiple of the estimated size of the
    initial non-recursive term of the query. This helps the planner
    choose the most appropriate method for joining the working table to
    the query's other tables. The default value is 10.0. A smaller value
    such as 1.0 can be helpful when the recursion has low “fan-out” from
    one step to the next, as for example in shortest-path queries. Graph
    analytics queries may benefit from larger-than-default values.

-- 
Justin