Re: "Big O" notation for postgres?

"H. Hall" <hhall1001@xxxxxxxxxxxxxx> · Mon, 26 May 2008 11:00:18 -0400

Gregory Stark wrote:
"Richard Huxton" <dev@xxxxxxxxxxxx> writes:

Jonah H. Harris wrote:

On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:

Does anyone know if there is a source that provides "Big O" notation for
postgres's aggregate functions and operations?  For example is count(*) =
O(1) or O(n)?

I don't know of any document containing the complexity of each
aggregate, but it's sometimes left as a comment in the souce code.

Recent max() and min() can be O(n) or O(1) depending on the where-clause and
presence of an index too, just to muddy the waters.

When I first read the above from Jonah I just assumed some Postgres 
magic was producing O(1). After seeing this again, I believe that 
Postgres must be doing something like the following for max and min:
Max:     ORDER BY colName  DESC LIMIT 1
Min:      ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index 
as in the above then the Max and Min functions would both be O(log N) , 
this is  log base2 of N, which is the time it takes to search a balanced 
binary tree, not O(1) which implies a constant time to perform, 
regardless of the size of the dataset N.

Hm, true. But excluding those special cases all Postgres aggregate functions
will be O(n) unless they're doing something very odd. None of the built-in
functions (except min/max as noted) do anything odd like that.

The reason way is because of the basic design of Postgres aggregate functions.
They are fed every tuple one by one and have to keep their state in a single
variable. Most of the aggregate functions like count(*) etc just keep a static
non-growing state and the state transition function is a simple arithmetic
operation which is O(1). So the resulting operation is O(n).

Actually one exception would be something like

CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');

Since the state variable has to keep accumulating more and more stuff the
array_append becomes more and more expensive (it has to generate a new array
so it has to copy the existing stuff). So actually it woul dbe O(n^2).

The only builtin aggregate which looks like it falls in this category would be
xmlagg()

Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its 
estimates.  This would allow one to quickly tell how a query scales with 
a large number of rows.

Thanks,
HH

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com