Search Postgresql Archives

Re: (Hopefully stupid) select question.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/24/2011 05:02 PM, A.M. wrote:
On Jan 24, 2011, at 10:50 AM, Fredric Fredricson wrote:

I have been fighting with a select and can find no satisfactory solution.

Simplified version of the problem:

A table that, in reality, log state changes to an object (represented as a row in another table):

CREATE TABLE t (
    id SERIAL UNIQUE,
    ref INTEGER, -- Reference to a row in another table
    someData TEXT,
    inserted DATE DEFAULT CURRENT_TIMESTAMP
) ;
Then we insert multiple rows for each "ref" with different "someData".


Now I want the latest "someData" for each "ref" like:

ref | someData (only latest inserted)
-------------
1  | 'data1'
2  | 'data2'
etc...

The best solution I could find depended on the fact that serial is higher for higher dates. I do not like that because if that is true, it is an indirect way to get the data and could possibly, in the future, yield the wrong result if unrelated changes where made or id's reused.

Here is my solution (that depend on the SERIAL):
SELECT x.ref,x.someData
  FROM t as x
  NATURAL JOIN (SELECT ref,max(id) AS id FROM t GROUP BY ref ORDER BY ref) AS y ;

Can somebody come up with a better solution? (without resorting to stored procedures and other performance killers).
I would argue that relying on the id is safer than relying on the current timestamp because CURRENT_TIMESTAMP refers to the time that the transaction is started, not when the transaction was committed (or the row was "actually" inserted). In addition, it is technically possible for two transactions to get the same CURRENT_TIMESTAMP. SERIAL values are never reused. You could also create a security view which exposes the historical data but without the primary key in the actual table.
Well, in my case the transaction time is not an issue really. The database is a backend to a REST Web service and all transactions are short (as dictated by the web server).
But I see your point.
I recommend http://pgfoundry.org/projects/tablelog which uses "performance killers" like stored procedures to handle things properly- at least take a look to see how things are handled.
I looked at this page and it is not what I need for this particular problem, since I log only specific changes in state and these changes are represented as rows in this state-log table (the row in the referenced table is not changed).

But I do log changes in about 80% of my tables and I use a technique similar to the one described in the table log. I have a script that parse my sql-code and auto-generate sql statemens that creates a "shadow"-table and the triggers required. I also have a mandatory "header" on all my logged tables and store an entry in a change log table with information about user name (external user, not SQL ROLE) and timestamp. This way all changes can be traced in time and I can, in theory, get a snapshot of my entire data at an arbitrary point in time. I say "in theory" because I have not implemented it and with a lot of unions and such I expect the performance to suck. I will however use it for parts of the data, which is why I implemented it.

And about performance. In my application insert performance is not an issue, I suspect it rarely is in systems run by human hands. Read performance on the other hand can definitely be an issue since reads are much more frequent and contains more data. I use views a lot and at one point I had nested views that used stored procedures and I started to get select times in the region of 7-800ms for simple selects with a couple of hundred rows in the result set. Not funny. I removed the stored procedures (it was painful!) and the nested views and got select times down to 20-40ms. Not entirely satisfactory, maybe, but much better and with some decent hardware I guess it would be even better.
Hence my remark about stored procedures as "performance killers".

Thanks,
Fredric
Cheers,
M

begin:vcard
fn:Fredric Fredricson
n:Fredricson;Fredric
org:Ln4 Solutions AB
email;internet:Fredric.Fredricson@xxxxxxxxxxxxx
title:CTO
tel;home:+46 8 91 64 39
tel;cell:+46 70 677 58 48
version:2.1
end:vcard

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux