Re: (Hopefully stupid) select question.

Fredric Fredricson <Fredric.Fredricson@xxxxxxxxxxxxx> · Mon, 24 Jan 2011 20:19:57 +0100

On 01/24/2011 05:02 PM, A.M. wrote:
On Jan 24, 2011, at 10:50 AM, Fredric Fredricson wrote:

I have been fighting with a select and can find no satisfactory solution.

Simplified version of the problem:

A table that, in reality, log state changes to an object (represented as a row in another table):

CREATE TABLE t (
    id SERIAL UNIQUE,
    ref INTEGER, -- Reference to a row in another table
    someData TEXT,
    inserted DATE DEFAULT CURRENT_TIMESTAMP
) ;
Then we insert multiple rows for each "ref" with different "someData".

Now I want the latest "someData" for each "ref" like:

ref | someData (only latest inserted)
-------------
1  | 'data1'
2  | 'data2'
etc...

The best solution I could find depended on the fact that serial is higher for higher dates. I do not like that because if that is true, it is an indirect way to get the data and could possibly, in the future, yield the wrong result if unrelated changes where made or id's reused.

Here is my solution (that depend on the SERIAL):
SELECT x.ref,x.someData
  FROM t as x
  NATURAL JOIN (SELECT ref,max(id) AS id FROM t GROUP BY ref ORDER BY ref) AS y ;

Can somebody come up with a better solution? (without resorting to stored procedures and other performance killers).
I would argue that relying on the id is safer than relying on the current timestamp because CURRENT_TIMESTAMP refers to the time that the transaction is started, not when the transaction was committed (or the row was "actually" inserted). In addition, it is technically possible for two transactions to get the same CURRENT_TIMESTAMP. SERIAL values are never reused. You could also create a security view which exposes the historical data but without the primary key in the actual table.
Well, in my case the transaction time is not an issue really. The 
database is a backend to a REST Web service and all transactions are 
short (as dictated by the web server).
But I see your point.
I recommend http://pgfoundry.org/projects/tablelog which uses "performance killers" like stored procedures to handle things properly- at least take a look to see how things are handled.
I looked at this page and it is not what I need for this particular 
problem, since I log only specific changes in state and these changes 
are represented as rows in this state-log table (the row in the 
referenced table is not changed).

But I do log changes in about 80% of my tables and I use a technique 
similar to the one described in the table log. I have a script that 
parse my sql-code and auto-generate sql statemens that creates a 
"shadow"-table and the triggers required. I also have a mandatory 
"header" on all my logged tables and store an entry in a change log 
table with information about user name (external user, not SQL ROLE) and 
timestamp. This way all changes can be traced in time and I can, in 
theory, get a snapshot of my entire data at an arbitrary point in time. 
I say "in theory" because I have not implemented it and with a lot of 
unions and such I expect the performance to suck. I will however use it 
for parts of the data, which is why I implemented it.

And about performance. In my application insert performance is not an 
issue, I suspect it rarely is in systems run by human hands. Read 
performance on the other hand can definitely be an issue since reads are 
much more frequent and contains more data. I use views a lot and at one 
point I had nested views that used stored procedures and I started to 
get select times in the region of 7-800ms for simple selects with a 
couple of hundred rows in the result set. Not funny. I removed the 
stored procedures (it was painful!) and the nested views and got select 
times down to 20-40ms. Not entirely satisfactory, maybe, but much better 
and with some decent hardware I guess it would be even better.
Hence my remark about stored procedures as "performance killers".

Thanks,
Fredric
Cheers,
M

begin:vcard
fn:Fredric Fredricson
n:Fredricson;Fredric
org:Ln4 Solutions AB
email;internet:Fredric.Fredricson@xxxxxxxxxxxxx
title:CTO
tel;home:+46 8 91 64 39
tel;cell:+46 70 677 58 48
version:2.1
end:vcard

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general