Re: TB-sized databases

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 06 Dec 2007 13:34:47 -0500

Matthew <matthew@xxxxxxxxxxx> writes:
> On Thu, 6 Dec 2007, Tom Lane wrote:
>> Hmm.  IIRC, there are smarts in there about whether a mergejoin can
>> terminate early because of disparate ranges of the two join variables.

> Very cool. Would that be a planner cost estimate fix (so it avoids the
> merge join), or a query execution fix (so it does the merge join on the
> table subset)?

Cost estimate fix.  Basically what I'm thinking is that the startup cost
attributed to a mergejoin ought to account for any rows that have to be
skipped over before we reach the first join pair.  In general this is
hard to estimate, but for mergejoin it can be estimated using the same
type of logic we already use at the other end.

After looking at the code a bit, I'm realizing that there's actually a
bug in there as of 8.3: mergejoinscansel() is expected to be able to
derive numbers for either direction of scan, but if it's asked to
compute numbers for a DESC-order scan, it looks for a pg_stats entry
sorted with '>', which isn't gonna be there.  It needs to know to
look for an '<' histogram and switch the min/max.  So the lack of
symmetry here is causing an actual bug in logic that already exists.
That makes the case for fixing this now a bit stronger ...

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match