"Dirschel, Steve" <steve.dirschel@xxxxxxxxxxxxxxxxxx> writes: > The query and execution plan are shown below. My question is > related to the result set the optimizer is choosing to build the > hash table from. My understanding is for a hash join you want to > build the hash table out of the smaller result set. That's *a* consideration, but not the only one. We also consider whether the hash key has a flat distribution; if it is too skewed, we might find specific hash chains getting too long. > When running some tests I forgot to create the PK on table > docloc_test. When the PK was not on the table the optimizer decided > to create the hash table off the 1000 rows from collection. But as > soon as I put the PK on that table it then decides to use > docloc_test to build the hash table. I think that the presence of a unique index overrides the statistics (or the lack of any) so that the planner knows that the column is unique and thus safe to use as a hash key. Now, it should have known that anyway, unless maybe this is a freshly-built table that auto-analyze hasn't gotten to yet? regards, tom lane