> On 29 Aug 2020, at 10:24, Thorsten Schöning <tschoening@xxxxxxxxxx> wrote: > > Hi all, > > I have a table containing around 95 million rows, pretty much only > storing a timestamp and further IDs of related tables containing the > actual data in the end. > >> CREATE TABLE clt_rec >> ( >> id BIGSERIAL NOT NULL, >> oms_rec BIGINT NOT NULL, >> captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL, >> rssi SMALLINT NOT NULL, >> CONSTRAINT pk_clt_rec PRIMARY KEY (id), >> CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" ("id"), >> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec) >> ); > > In many use cases I need to search all of those rows based on their > timestamp to find rows arbitrary in the past: Sometimes it's only 15 > minutes into the past, sometimes it's 2 years, sometimes it's finding > the first day of each month over 15 months for some of those telegrams > etc. In the end, I pretty often need to compare those timestamps and > some queries simply take multiple seconds in the end, especially > adding up if multiple, but slightly different queries need to be > executed one after another. The following are two abstracts of > Postgres' query plans: > > Plan 1: > >> -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual time=0.313..113.974 rows=34266 loops=3) >> -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 rows=34266 loops=3) >> Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval))) >> -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799) >> Index Cond: (id = clt_rec.oms_rec) What happens here is that the planner looks up the lower and upper boundaries, everything in between those index nodes is a candidate record. Next, it loops over those to match the other condition of your query (id = clt_rec.oms_rec). You didn’t tell whether there’s an index on that column. You’d probably see a performance improvement were you to create an index on (captured_at, id). If your Postgres version is somewhat recent, that could even lead to an Index Only Scan. > Plan 2: > >> -> Nested Loop (cost=1.14..836381.50 rows=111934 width=20) (actual time=0.379..911.697 rows=334465 loops=3) >> -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 rows=334465 loops=3) >> Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval))) >> -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.39 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=1003394) >> Index Cond: (id = clt_rec.oms_rec) And this situation is very much the same issue, apart from the larger number of candidate records. > Postgres seems to properly use available indexes, parallel workers and > stuff like that. But looking at the actual times and compared to all > the other parts of the query, comparing those timestamps simply takes > the most time. It only needs to compare 2 timestamps. > I've looked into this topic and found statements about that one > shouldn't put too many rows into the index[1] and stuff like that or > it will be ignored at all. But that doesn't seem to be the case for me > according to the plan. OTOH, my index really simply is about the > column containing the timestamp, no function reducing things to dates > or stuff like that to reduce the number of rows. > >> CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at ); Try this: CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at, id ); Alban Hertroys -- There is always an exception to always.