Josh Berkus wrote:
I actually think the way to attack this issue is to discuss the kinds of
errors the planner makes, and what tweaks we could do to correct them.
Here's the ones I'm aware of:
-- Incorrect selectivity of WHERE clause
-- Incorrect selectivity of JOIN
-- Wrong estimate of rows returned from SRF
-- Incorrect cost estimate for index use
Can you think of any others?
The one that started this discussion: Lack of cost information for functions. I think this feature is a good idea independent of the whole HINTS discussion.
At a minimum, a rough categorization is needed, such as "Lighning fast / Fast / Medium / Slow / Ludicrously slow", with some sort if milliseconds or CPU cycles associated with each category. Or perhaps something like, "This is (much faster|faster|same as|slower|much slower) than reading a block from the disk."
If I understand Tom and others, the planner already is capable of taking advantage of this information, it just doesn't have it yet. It could be part of the CREATE FUNCTION command.
CREATE OR REPLACE FUNCTION foobar(text, text, text) RETURNS text
AS '/usr/local/pgsql/lib/foobar.so', 'foobar'
COST LUDICROUSLY_SLOW
LANGUAGE 'C' STRICT;
Better yet ('tho I have no idea how hard this would be to implement...) would be an optional second function with the same parameter signature as the main function, but it would return a cost estimate:
CREATE OR REPLACE FUNCTION foobar(text, text, text) RETURNS text
AS '/usr/local/pgsql/lib/foobar.so', 'foobar'
COST foobar_cost
LANGUAGE 'C' STRICT;
The planner could call it with the same parameters it was about to use, and get an accurate estimate for the specific operation that is about to be done. In my particular case (running an NP-complete problem), there are cases where I can determine ahead of time that the function will be fast, but in most cases it is *really* slow.
Craig