Re: expected O^2 looks line K^O, index problem not involved: [was] looping simpler query just faster

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Thu, 10 Jul 2008 16:56:33 +0200

On Thu, 10 Jul 2008 15:52:54 +0200
Martijn van Oosterhout <kleptog@xxxxxxxxx> wrote:

> On Thu, Jul 10, 2008 at 02:19:30PM +0200, Ivan Sergio Borgonovo
> wrote:
> > On Thu, 10 Jul 2008 11:50:01 +0200
> > Martijn van Oosterhout <kleptog@xxxxxxxxx> wrote:

> Hmm, I just studied your query to determine what you're trying to
> do. As I understand it:
> 
> For each item
>   Determine the brand
>   Get the top three items for this brand
>   If this item is one of them, display it
> 
> This is pretty inefficient but I can't see an efficient way to do
> it either. I suppose one thing to try would be a multicolumn index
> on (brand,datainserimento) to avoid the sort step. Also, the table
> b1 in the original query is redundant.

> It's the fact that you want the top three items that makes it
> difficult, not sure how to deal with that.

I'm not concerned about the fact that it is not "easy"... I'm
concerned about the fact that small changes to the query produce
unexpected results in performances.

I'd say that the filter on

and i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month')

shouldn't play an important role... or at least have the same weight
on performance if used to build up a temporary table or directly in
the subquery.
I thought that indexes were going to play an important role but
still they don't.

I haven't been able to come up with a single statement that can get
the top N row by group in PostgreSQL that can compete with a set of
simpler statements.

There are a lot of examples pointing to a couple of standard
solution on MySQL and MS SQL[1] (the other standard solution uses
HAVING).
I didn't benchmark the same SQL on these 2 other DB but I think I
could assume that if they were performing so badly they wouldn't be
proposed so frequently.

Considering it is pretty trivial to write a stored procedure that
create a temp table, create some indexes on it, loops over groups
and pick up the top N record and that system proved to perform quite
well I'm still curious to know if there is postgresql way that
performs comparably with the just described method.

It would be nice if I could exploit further the fact that I'm going
to sort multiple times the temp table since the kernel of the
function will be something like

for _group in select group from grouptable loop
  select name, group from table where group=_group sort by
    datainserimento limit 3;
  return next;
end loop;
return;

I think building the temp table already sorted could result in some
performance boost but that would depend on implementation details.

[1] excluding the ones that involve non standard functions

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it