Re: Re: Re: Re: Query Optimization with Kruskal’s Algorithm

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 10 May 2008 20:30:39 -0400

"Jonah H. Harris" <jonah.harris@xxxxxxxxx> writes:
> Wikipedia has the algorithm itself
> (http://en.wikipedia.org/wiki/Kruskal's_algorithm), but I was more
> interested in the actual applicability to PG and any issues they ran
> into.

Hmm ... minimum spanning tree of a graph, eh?  Right offhand I'd say
this is a pretty terrible model of the join order planning problem.
The difficulty with trying to represent join order as a weighted
graph is that it assumes the cost to join two relations (ie, the
weight on the arc between them) is independent of what else you have
joined first.  Which is clearly utterly wrong for join planning.

Our GEQO optimizer has a similar issue --- it uses a search algorithm
that is designed to solve traveling-salesman, which is almost the same
thing as minimum spanning tree.  The saving grace for GEQO is that its
TSP orientation is only driving a heuristic; when it considers a given
overall join order it is at least capable of computing the right cost.
It looks to me like Kruskal's algorithm is entirely dependent on the
assumption that minimizing the sum of some predetermined pairwise costs
gives the correct plan.

In short, I'm sure it's pretty fast compared to either of our current
join planning methods, but I'll bet a lot that it often picks a much
worse plan.  Color me unexcited, unless they've found some novel way
of defining the graph representation that avoids this problem.

			regards, tom lane