On Thu, 17 Dec 2009 10:38:32 +0100 "A. Kretschmer" <andreas.kretschmer@xxxxxxxxxxxxxx> wrote: > In response to Ivan Sergio Borgonovo : > > I've a web application. > > > > I'm logging data related to the landing page and subsequent > > selected hits. > > > > create table track_hit ( > > hitid serial not null, > > /* pk? I don't mind if serial wrap around > > pk could be (hitid, tracking_time) */ > > esid varchar(32), -- related to session > > tracking_time timestamp not null default now(), > > -- some other stats > > -- first_hit boolean not null default false, -- ??? > > ); > > > > I'd like to be sure I just count one "first hit" in a session > > (same esid) in an efficient way that means I'd like to mark them > > with a > > select distinct on (esid) esid, tracking_time from track_hit order > by esid, tracking_time; > > returns only one record for each esid, ordered by tracking_time. > Should work with 8.x, maybe sice 7.x (I'm not sure) I think I've tried to resolve a concurrency problem in the wrong place... still... what is the difference between: select min(hitid) as h from track_hit group by esid; and select distinct on (esid) hitid from track_hit order by esid, track_time; I haven't enough test data to see if they perform differently. The second form seems to perform a little bit faster. I'd expect the opposite: the first performing better. I think I'll add an index on track_time for reporting and maybe make primary key (hitid, tracking_time). I don't want to be bothered by hitid wrap around, so I don't want to make it a pk alone, still I may need a pk. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general