Greetings,
I have a large table (~90 million rows) containing vessel positions. In addition to a column that contains the location information (the_geom), the table also contains two columns that are used to uniquely identify the vessel (mmsi and name) and a column containing the Unix time (epoch) at which the position information was logged. I frequently need to assign records to vessel transits. To do this, I currently create a CTE that uses a Window function (partitioning the data by mmsi and name ordered by epoch) to examine the time that has elapsed between successive position reports for individual vessels. For every position record for a vessel (as identified using mmsi and name), if the time elapsed between the current position record and the previous record (using the lag function) is less than or equal to 2 hours, I assign the record a value of 0 to a CTE column named tr_index. If the time elapsed is greater than 2 hours, I assign the record a value of 1 to the tr_index column. I then use the CTE to generate transit numbers by summing the values in the tr_index field across a Window that also partitions the data by mmsi and name and is ordered by epoch. This works, but is very slow (hours). The table is indexed (multi-column index on mmsi, name and index on epoch). Does anyone see a way to get what I am after in a more efficient manner. What I am after is an assignment of transit number to vessels' position records based on whether the records were within two hours of each other. The SQL that I used is provided below. Any advice would be greatly appreciated...
WITH
cte_01 AS
(
SELECT
a.id,
a.mmsi,
a.epoch,
a.the_geom
CASE
WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
ELSE 0
END AS tr_index
FROM table a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
)
SELECT
a.id,
a.mmsi,
a.epoch,
a.the_geom,
1 + sum(a.tr_index) OVER w AS transit,
a.active
FROM cte_01 a
WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
--
Jeff