Hi, Webb Sprague. You wrote:
... but I see two seq scans in your explain in a loop -- this is
probably not good. If you can find a way to rewrite the IN clause
(either de-normalizing through triggers to save whatever you need on
an insert and not have to deal with a set, or by using except in the
query, or someing else more creative)...
I would normally agree that an IN clause is a place to worry -- except
that I'm using IN on a very small table, with about 4-5 rows. That
might indeed affect things, and I could certainly pull out these values
into a Perl variable that I then interpolate literally into the SQL.
However, I have to assume that this wouldn't affect things all that much.
Also -- there is a good book on temporal databases by Snodgrass that
might give some interesting ideas; maybe you have already seen it, but
still.
Heh -- that's one of the first references that I looked at when I
started this project. Perhaps I'll look at it again; the design of my
data warehouse took some of those suggestions into account, but I
wouldn't be surprised if there's something more I could be doing.
I am thinking you could increment a sequence variable to give
you a "tick" integer with each action, rather than trying to use
timestamps with all their overhead and inaccuracy (1 second is a long
time, really). Lamport also did work on clocks that might apply.
I agree that 1 second is a ridiculously long time. The logfiles were
created by people outside of my research group, and are a wonderful
lesson in how to use XML poorly and inefficiently. The designers of the
logfile weren't expecting to get dozens or hundreds of values per
second. You can be sure that my research group, which is now looking to
create logfiles of our own, will place a strong emphasis on
high-resolution timestamps.
Practically speaking, I've managed to get around this problem by using
the "id" column, which comes from a sequence, and is thus guaranteed to
be increasing. We can thus be assured that a row with an ID of x will
come before a row whose ID is x + 10, so long as their transaction types
are the same. So we can't guarantee that a node entry comes before a
variable setting via the ID, but we can order two variable settings
based on the ID.
Also have you tried dropping all your fk and checks just to see if you
get a difference in speed on an update? It would be interesting,
perhaps.
I sorta figured that because our UPDATE is taking place on a column
without a foreign key, that this wouldn't matter much. Am I wrong?
If you could get rid of the sorted limit 1 clause in your function,
there would be less variablity and make it easier to understand; you
probably need to denormalize somehow, perhaps using ticks; I am not
sure....
Yeah, that's a good point. I'm not sure how possible that'll be,
though, given that at *some* point, I need to find the "most recent"
setting of the same variable. There might be dozens or hundreds of rows
separating the assignment of a new value to a particular variable.
Could a trigger set your previous_value on insert or update, rather
than querying for it later?
Well, what I've been doing is INSERTing each row into the "transactions"
table without previous_value. Then I run the mass UPDATE for
previous_value, doing nothing else at the same time. Then, after
previous_value is set, I do the query.
Reuven