Re: distinct values and count over window function

"David Johnston" <polobo@xxxxxxxxx> · Tue, 27 Nov 2012 16:50:16 -0500

From: pgsql-general-owner@xxxxxxxxxxxxxx
[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Rodrigo Rosenfeld
Rosas
Sent: Tuesday, November 27, 2012 4:23 PM
To: pgsql-general@xxxxxxxxxxxxxx
Subject:  distinct values and count over window function

Hello, this is my first post to this particular list (I've also posted to
performance list a while ago to report a bug).

I generate complex dynamic queries and paginate them using "LIMIT 50". I
have a need for implementing some statistics for the results as well and I'd
like to know if I can avoid extra queries or how to be able to do that
without much performance hit.

The statistics calculation that bothers me most is a distinct one. For
example, suppose you have a query that would return more than 50 distinct
values while only the first 50 ones are brought by the paginated query.

When the user asks for its statistics it should show all counts by distinct
value including those not present in the current results page for certain
columns.

I noticed that there is a plan for PG 9.2 to include a query_to_json
function that would indeed help me to get this done when used altogether
with a window function.

Our application is currently running PG 9.1 (was rolled back from 9.2 due to
a performance bug that seems to be now fixed but not released yet). Is there
anyway I could avoid running another query to get this kind of statistics?

The queries can become really complex and take almost 5s to complete in some
cases. Which would mean that performing 2 queries with those same conditions
would take about 10s... I could perform them in different times but then the
user would have to wait for 5s just to get the statistics calculation...
This is what I'd like to avoid.

This application is a clustered web application, so I'd rather avoid any
client library support for paginating results using cursors or something
like that and stick with OFFSET and LIMIT.

Any help is appreciated!

Thanks,
Rodrigo.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.

Conceptually you could execute and cache (via a Temporary or Permanent
Table) the results of the unlimited query then use that cache to calculate
your statistics as well as perform your subsequent LIMIT/OFFSET queries.

In 9.1+ you have the ability to incorporate INSERT inside a WITH/CTE so that
you can perform multiple actions like this without resorting to the use of a
function - though a function may have value as well.

David J.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general