2013/3/11 Jeff Adams - NOAA Affiliate <jeff.adams@xxxxxxxx>: > Pavel, > > Thanks for the response. I have not yet had the opportunity to use cursors, > but am now curious. Could you perhaps provide a bit more detail as to what > the implementation of your suggested approach would look like? an example: $$ DECLARE r record; prev_r record; BEGIN FOR r IN SELECT * FROM a ORDER BY epoch, mmsi LOOP IF prev_r IS NOT NULL THEN /* do some counting */ prev_r contains previous row, r contains current row do some RETURN NEXT .. /* return data in defined order */ END IF; prev_r = r; END LOOP; Probably slow part of your query is sorting - first can be accelerated by index, but second (as CTE result cannot) - you can try increase work_mem ?? Regards Pavel > > > On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule <pavel.stehule@xxxxxxxxx> > wrote: >> >> Hello >> >> you can try procedural solution - use a cursor over ordered data in >> plpgsql and returns table >> >> Regards >> >> Pavel Stehule >> >> 2013/3/11 Jeff Adams - NOAA Affiliate <jeff.adams@xxxxxxxx>: >> > Greetings, >> > >> > >> > >> > I have a large table (~90 million rows) containing vessel positions. In >> > addition to a column that contains the location information (the_geom), >> > the >> > table also contains two columns that are used to uniquely identify the >> > vessel (mmsi and name) and a column containing the Unix time (epoch) at >> > which the position information was logged. I frequently need to assign >> > records to vessel transits. To do this, I currently create a CTE that >> > uses a >> > Window function (partitioning the data by mmsi and name ordered by >> > epoch) to >> > examine the time that has elapsed between successive position reports >> > for >> > individual vessels. For every position record for a vessel (as >> > identified >> > using mmsi and name), if the time elapsed between the current position >> > record and the previous record (using the lag function) is less than or >> > equal to 2 hours, I assign the record a value of 0 to a CTE column named >> > tr_index. If the time elapsed is greater than 2 hours, I assign the >> > record a >> > value of 1 to the tr_index column. I then use the CTE to generate >> > transit >> > numbers by summing the values in the tr_index field across a Window that >> > also partitions the data by mmsi and name and is ordered by epoch. This >> > works, but is very slow (hours). The table is indexed (multi-column >> > index on >> > mmsi, name and index on epoch). Does anyone see a way to get what I am >> > after >> > in a more efficient manner. What I am after is an assignment of transit >> > number to vessels' position records based on whether the records were >> > within >> > two hours of each other. The SQL that I used is provided below. Any >> > advice >> > would be greatly appreciated... >> > >> > >> > >> > WITH >> > >> > cte_01 AS >> > >> > ( >> > >> > SELECT >> > >> > a.id, >> > >> > a.mmsi, >> > >> > a.name, >> > >> > a.epoch, >> > >> > a.the_geom >> > >> > CASE >> > >> > WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1 >> > >> > ELSE 0 >> > >> > END AS tr_index >> > >> > FROM table a >> > >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) >> > >> > ) >> > >> > >> > >> > >> > >> > SELECT >> > >> > a.id, >> > >> > a.mmsi, >> > >> > a.name, >> > >> > a.epoch, >> > >> > a.the_geom, >> > >> > 1 + sum(a.tr_index) OVER w AS transit, >> > >> > a.active >> > >> > FROM cte_01 a >> > >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) >> > >> > >> > >> > -- >> > Jeff > > > > > -- > Jeffrey D. Adams > Contractor > OAI, Inc. > In support of: > National Marine Fisheries Service > Office of Protected Resources > 1315 East West Hwy, Building SSMC3 > Silver Spring, MD 20910-3282 > phone: (301) 427-8434 > fax: (301) 713-0376 -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance