Re: [HACKERS] Big IN() clauses etc : feature proposal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




    The problem is that you need a set-returning function to retrieve
the  values. SRFs don't have rowcount estimates, so the plans suck.

What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?

	It would be very useful.
	A few thoughts...

You need to do some processing to know how many rows the function would return.
	Often, this processing will be repeated in the function itself.
Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator.
	
	So, it might be a bit hairy to find a good compromise.

	Ideas on how to do this (clueless hand-waving mode) :

1- Add new attributes to set-returning functions ; basically a list of functions, each returning an estimation parameter (rowcount, cpu tuple cost, etc).
	This is just like you said.

2- Add an "estimator", to a function, which would just be another function, returning one row, a record, containing the estimations in several columns (rowcount, cpu tuple cost, etc). Pros : only one function call to estimate, easier and faster, the estimator just leaves the unknown columns to NULL. The estimator needs not be in the same language as the function itself. It's just another function.

3- The estimator could be a set-returning function itself which would return rows mimicking pg_statistics Pros : planner-friendly, the planner would SELECT from the SRF instead of looking in pg_statistics, and the estimator could tell the planner that, for instance, the function will return unique values.
	Cons : complex, maybe slow

	4- Add simple flags to a function, like :
	- returns unique values
	- returns sorted values (no need to sort my results)
- please execute me and store my results in a temporary storage, count the rows returned, and plan the outer query accordingly
	- etc.
	


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux