On 10/15/18 8:57 AM, Phil Endecott wrote:
Dear Experts, I have a few tables with "raw" timestamsps like this: +-------------------------------+----------+ | time | pressure | +-------------------------------+----------+ | 2018-09-14 00:00:07.148378+00 | 1007.52 | | 2018-09-14 00:10:07.147506+00 | 1007.43 | | 2018-09-14 00:20:07.147533+00 | 1007.28 | +-------------------------------+----------+ For each of these tables I have a view which rounds the timestamp to the nearest minute, and ensures there is only one row per minute: SELECT date_trunc('minute'::text, tbl."time") AS "time", max(tbl.pressure) AS pressure FROM tbl GROUP BY (date_trunc('minute'::text, tbl."time")) ORDER BY (date_trunc('minute'::text, tbl."time")); I then join these tables on the rounded time: SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time", rain.rain, pressures.pressure, temperatures.temperature FROM rain FULL JOIN pressures USING ("time") FULL JOIN temperatures USING ("time"); +------------------------+------+----------+-------------+ | time | rain | pressure | temperature | +------------------------+------+----------+-------------+ | 2018-09-14 00:00:00+00 | 0 | 1007.52 | 11.349 | | 2018-09-14 00:10:00+00 | 0 | 1007.43 | 11.2317 | | 2018-09-14 00:20:00+00 | 0 | 1007.28 | 11.2317 | +------------------------+------+----------+-------------+ The COALESCE for time and the full joins are needed because some columns may be missing for some minutes. Now I'd like to find the values for a particular short time period: SELECT * FROM readings WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'
Is readings a table or view? If view is the SELECT COALESCE ... query the view query?
This works, but it is inefficient; it seems to create all the rounded data, do the join on all of it, and then filter on the time period. Ideally it would filter the raw data, and then need to round and join far fewer rows. It would not be difficult for me to round the timestamps when inserting the data, and also ensure that there is only one row for each minute. But I've done some experiments and even if I remove all the rounding and replace the full joins with regular joins, it still does sequential scans on at least one of the tables: Nested Loop (cost=12.95..144.99 rows=135 width=20) Join Filter: (x_rain."time" = x_pressures."time") -> Hash Join (cost=12.67..97.83 rows=135 width=24) Hash Cond: (x_temperatures."time" = x_rain."time") -> Seq Scan on x_temperatures (cost=0.00..67.50 rows=4350 width=12) -> Hash (cost=10.98..10.98 rows=135 width=12) -> Index Scan using x_rain_by_time on x_rain (cost=0.28..10.98 rows=135 width=12) Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <= '2018-10-02 00:00:00+00'::timestamp with time zone)) -> Index Scan using x_pressures_by_time on x_pressures (cost=0.28..0.34 rows=1 width=12) Index Cond: ("time" = x_temperatures."time") Maybe that is because the tables are currently relatively small (a few thousands rows) and it believes that sequential scans are faster. (I have sometimes wished for an "explain" variant that tells me what query plan it would choose if all the tables were 100X larger.) Is there anything I can do to make this more efficient when the tables are larger? Thanks for any suggestions. Regards, Phil.
-- Adrian Klaver adrian.klaver@xxxxxxxxxxx