Not many. It fluctuates, but there are usually only ever a few hundred
at most. Each assetid has multi-millions of positions though.
Mark Lewis wrote:
Hmmm. How many distinct assetids are there?
-- Mark Lewis
On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
The "summary table" approach maintained by triggers is something we are
considering, but it becomes a bit more complicated to implement.
Currently we have groups of new positions coming in every few seconds or
less. They are not guaranteed to be in order. So for instance, a group
of positions from today could come in and be inserted, then a group of
positions that got lost from yesterday could come in and be inserted
afterwards.
This means the triggers would have to do some sort of logic to figure
out if the newly inserted position is actually the most recent by
timestamp. If positions are ever deleted or updated, the same sort of
query that is currently running slow will need to be executed in order
to get the new most recent position. So there is the possibility that
new positions can be inserted faster than the triggers can calculate
and maintain the summary table. There are some other complications
with maintaining such a summary table in our system too, but I won't get
into those.
Right now I'm just trying to see if I can get the query itself running
faster, which would be the easiest solution for now.
Graham.
Mark Lewis wrote:
Have you looked into a materialized view sort of approach? You could
create a table which had assetid as a primary key, and max_ts as a
column. Then use triggers to keep that table up to date as rows are
added/updated/removed from the main table.
This approach would only make sense if there were far fewer distinct
assetid values than rows in the main table, and would get slow if you
commonly delete rows from the main table or decrease the value for ts in
the row with the highest ts for a given assetid.
-- Mark Lewis
On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
Thanks Tom, that explains it and makes sense. I guess I will have to
accept this query taking 40 seconds, unless I can figure out another way
to write it so it can use indexes. If there are any more syntax
suggestions, please pass them on. Thanks for the help everyone.
Graham.
Tom Lane wrote:
Graham Davis <gdavis@xxxxxxxxxxxxxxx> writes:
How come an aggreate like that has to use a sequential scan? I know
that PostgreSQL use to have to do a sequential scan for all aggregates,
but there was support added to version 8 so that aggregates would take
advantage of indexes.
Not in a GROUP BY context, only for the simple case. Per the comment in
planagg.c:
* We don't handle GROUP BY, because our current implementations of
* grouping require looking at all the rows anyway, and so there's not
* much point in optimizing MIN/MAX.
The problem is that using an index to obtain the maximum value of ts for
a given value of assetid is not the same thing as finding out what all
the distinct values of assetid are.
This could possibly be improved but it would take a considerable amount
more work. It's definitely not in the category of "bug fix".
regards, tom lane
--
Graham Davis
Refractions Research Inc.
gdavis@xxxxxxxxxxxxxxx