On Wed, Dec 28, 2022 at 10:39:14AM -0500, Gunther Schadow wrote: > I have a complex query which essentially runs a finite state automaton > through a with recursive union, adding the next state based on the > previous. This is run at 100,000 or a million start states at the same > time, picking a new record (token), matching it to the FSA (a three-way > join: > There are 100s of thousands of states. This join has a HUGE fan out if it is > I doubt that I can find any trick to give to the planner better data which > it can then use to figure out that the merge join is a bad proposition. > Note, for my immediate relief I have forced it by simply set > enable_mergejoin=off. This works fine, except, it converts both into a > nested loop, but the upper merge join was not a problem, and sometimes (most > often) nested loop is a bad choice for bulk data. It's only for this > recursive query it sometimes makes sense. Maybe the new parameter in v15 would help. https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR recursive_worktable_factor (floating point) Sets the planner's estimate of the average size of the working table of a recursive query, as a multiple of the estimated size of the initial non-recursive term of the query. This helps the planner choose the most appropriate method for joining the working table to the query's other tables. The default value is 10.0. A smaller value such as 1.0 can be helpful when the recursion has low “fan-out” from one step to the next, as for example in shortest-path queries. Graph analytics queries may benefit from larger-than-default values. -- Justin