An issue with max() and order by ... limit 1 in postgresql8.3-beta3

zxo102 ouyang <zxo102@xxxxxxxxx> · Sun, 10 Jan 2010 00:43:08 +0800

Hi everyone,    I am using postgresql 8.3-beta3. I have a table 'test' with three fields:
     sid     data                     date
      1        1.1                  2009-09-01 1:00:00
      1        2.1                  2010-01-01 1:00:20
      2        3.1                  2009-09-01 1:00:10
      2        0.1                  2010-01-01 1:00:30

I create index for data field. 
Each sid may have millions of rows.
I want to get maximum data value and corresponding "time" for each group of sid. Here is my query:
########################
select  t1.sid ,  max(t1.data)  
          (select t2.date 
	    from test t2, 
	    where t2.sid = t1.sid and 
		   t2.date between '2009-08-01' and '2010-01-02' and 
   	          order by t2.data DESC limit 1
           ) 
 from test t1 
 where  t1.date between '2009-08-01' and '2010-01-08' and 
 group by t1.sid
##########################
But when max() in postgresql may slow down the search when there are millions of rows for each sid. 
So I use " order by t2.data DESC limit 1" to find max:
########################
select  t1.sid ,
           (select t2.data 	    from test t2, 
	    where t2.sid = t1.sid and 
		   t2.date between '2009-08-01' and '2010-01-02' and 
   	          order by t2.data DESC limit 1
           )  
          (select t2.date 
	    from test t2, 
	    where t2.sid = t1.sid and 
		   t2.date between '2009-08-01' and '2010-01-02' and 
   	          order by t2.data DESC limit 1
           ) 
 from test t1 
 where  t1.date between '2009-08-01' and '2010-01-08' and 
 group by t1.sid
##########################
The second query looks "strange" since similar search is done twice. 
Because of two fields, the following can not be used directly in the above query.

          (select t2.date, t2.data 
	    from test t2, 
	    where t2.sid = t1.sid and 
		   t2.date between '2009-08-01' and '2010-01-02' and 
   	          order by t2.data DESC limit 1
           ) 

Any suggestions for the best way to get maximum data value and corresponding "time" for each group of sid in my case?

Thanks a lot.

ouyang