On 01/24/2011 05:02 PM, A.M. wrote:
On Jan 24, 2011, at 10:50 AM, Fredric Fredricson wrote:
I have been fighting with a select and can find no satisfactory solution.
Simplified version of the problem:
A table that, in reality, log state changes to an object (represented as a row in another table):
CREATE TABLE t (
id SERIAL UNIQUE,
ref INTEGER, -- Reference to a row in another table
someData TEXT,
inserted DATE DEFAULT CURRENT_TIMESTAMP
) ;
Then we insert multiple rows for each "ref" with different "someData".
Now I want the latest "someData" for each "ref" like:
ref | someData (only latest inserted)
-------------
1 | 'data1'
2 | 'data2'
etc...
The best solution I could find depended on the fact that serial is higher for higher dates. I do not like that because if that is true, it is an indirect way to get the data and could possibly, in the future, yield the wrong result if unrelated changes where made or id's reused.
Here is my solution (that depend on the SERIAL):
SELECT x.ref,x.someData
FROM t as x
NATURAL JOIN (SELECT ref,max(id) AS id FROM t GROUP BY ref ORDER BY ref) AS y ;
Can somebody come up with a better solution? (without resorting to stored procedures and other performance killers).
I would argue that relying on the id is safer than relying on the current timestamp because CURRENT_TIMESTAMP refers to the time that the transaction is started, not when the transaction was committed (or the row was "actually" inserted). In addition, it is technically possible for two transactions to get the same CURRENT_TIMESTAMP. SERIAL values are never reused. You could also create a security view which exposes the historical data but without the primary key in the actual table.
Well, in my case the transaction time is not an issue really. The
database is a backend to a REST Web service and all transactions are
short (as dictated by the web server).
But I see your point.
I recommend http://pgfoundry.org/projects/tablelog which uses "performance killers" like stored procedures to handle things properly- at least take a look to see how things are handled.
I looked at this page and it is not what I need for this particular
problem, since I log only specific changes in state and these changes
are represented as rows in this state-log table (the row in the
referenced table is not changed).
But I do log changes in about 80% of my tables and I use a technique
similar to the one described in the table log. I have a script that
parse my sql-code and auto-generate sql statemens that creates a
"shadow"-table and the triggers required. I also have a mandatory
"header" on all my logged tables and store an entry in a change log
table with information about user name (external user, not SQL ROLE) and
timestamp. This way all changes can be traced in time and I can, in
theory, get a snapshot of my entire data at an arbitrary point in time.
I say "in theory" because I have not implemented it and with a lot of
unions and such I expect the performance to suck. I will however use it
for parts of the data, which is why I implemented it.
And about performance. In my application insert performance is not an
issue, I suspect it rarely is in systems run by human hands. Read
performance on the other hand can definitely be an issue since reads are
much more frequent and contains more data. I use views a lot and at one
point I had nested views that used stored procedures and I started to
get select times in the region of 7-800ms for simple selects with a
couple of hundred rows in the result set. Not funny. I removed the
stored procedures (it was painful!) and the nested views and got select
times down to 20-40ms. Not entirely satisfactory, maybe, but much better
and with some decent hardware I guess it would be even better.
Hence my remark about stored procedures as "performance killers".
Thanks,
Fredric
Cheers,
M
begin:vcard
fn:Fredric Fredricson
n:Fredricson;Fredric
org:Ln4 Solutions AB
email;internet:Fredric.Fredricson@xxxxxxxxxxxxx
title:CTO
tel;home:+46 8 91 64 39
tel;cell:+46 70 677 58 48
version:2.1
end:vcard
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general