Re: Abysmal hash join

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Florian Weimer <fweimer@xxxxxx> writes:
>> Maybe larger stats targets would help.

> I've set default_statistics_target to 100 and rerun ANALYZE on that
> table.  The estimate went down to 43108 (and the hash join is still
> the preferred plan).  ANALZE with default_statistics_target = 200
> (which seems pretty large to me) is down to 26050 and the bitmap scan
> plan is chosen.

> PostgreSQL seems to think that there are only very few distinct values
> for that column (with default_statistics_target = 100 and 200):

Yeah, n_distinct estimation from a sample is inherently hard :-(.  Given
that you have such a long tail on the distribution, it might be worth
your while to crank the stats target for that column all the way to the
maximum (1000).  Also you need to experiment with extending the stats
for the smaller table.

I believe what's happening here is that the smaller table joins only to
less-frequent entries in the big table (correct?).  The hash join would
be appropriate if there were many rows joining to the very-frequent
entries, and the problem for the planner is to determine that that's not
so.  Given enough stats on the two joining columns, it should be able to
determine that.

Of course, large stats targets will slow down planning to some extent,
so you should also keep an eye on how long it takes to plan the query.

			regards, tom lane


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux