How to properly query lots of rows based on timestamps?

Thorsten Schöning <tschoening@xxxxxxxxxx> · Sat, 29 Aug 2020 10:24:04 +0200

Hi all,

I have a table containing around 95 million rows, pretty much only
storing a timestamp and further IDs of related tables containing the
actual data in the end.

> CREATE TABLE clt_rec
> ( 
>   id BIGSERIAL NOT NULL, 
>   oms_rec     BIGINT NOT NULL, 
>   captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, 
>   rssi        SMALLINT NOT NULL, 
>   CONSTRAINT pk_clt_rec PRIMARY KEY (id), 
>   CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" ("id"), 
>   CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) 
> );

In many use cases I need to search all of those rows based on their
timestamp to find rows arbitrary in the past: Sometimes it's only 15
minutes into the past, sometimes it's 2 years, sometimes it's finding
the first day of each month over 15 months for some of those telegrams
etc. In the end, I pretty often need to compare those timestamps and
some queries simply take multiple seconds in the end, especially
adding up if multiple, but slightly different queries need to be
executed one after another. The following are two abstracts of
Postgres' query plans:

Plan 1:

> ->  Nested Loop  (cost=1.14..343169.49 rows=43543 width=20) (actual time=0.313..113.974 rows=34266 loops=3)
>     ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 rows=34266 loops=3)
>         Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
>     ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>         Index Cond: (id = clt_rec.oms_rec)

Plan 2:

> ->  Nested Loop  (cost=1.14..836381.50 rows=111934 width=20) (actual time=0.379..911.697 rows=334465 loops=3)
>     ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 rows=334465 loops=3)
>         Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval)))
>     ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.39 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=1003394)
>         Index Cond: (id = clt_rec.oms_rec)

Postgres seems to properly use available indexes, parallel workers and
stuff like that. But looking at the actual times and compared to all
the other parts of the query, comparing those timestamps simply takes
the most time.

I've looked into this topic and found statements about that one
shouldn't put too many rows into the index[1] and stuff like that or
it will be ignored at all. But that doesn't seem to be the case for me
according to the plan. OTOH, my index really simply is about the
column containing the timestamp, no function reducing things to dates
or stuff like that to reduce the number of rows.

>  CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at );

So where does the actual runtime come from in the above plan? Is it
simply comparing lots of timestamps and that takes how long it takes?
Or is it calculating with those timestamps already?

Thanks!

[1]: https://stackoverflow.com/questions/15977741/postgres-is-ignoring-a-timestamp-index-why

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@xxxxxxxxxx
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow