Re: How to properly query lots of rows based on timestamps?

Thorsten Schöning <tschoening@xxxxxxxxxx> · Sun, 30 Aug 2020 15:54:11 +0200

Guten Tag Alban Hertroys,
am Samstag, 29. August 2020 um 11:29 schrieben Sie:

> [...]Next, it loops over those to match the other condition of
> your query (id = clt_rec.oms_rec). You didn’t tell whether there’s
> an index on that column.

Which column, oms_rec.id or clt_rec.oms_rec? The former has one
because it's the unique key and the plan says so as well:

> Index Scan using pk_oms_rec on oms_rec

clt_rec.oms_rec OTOH is only marked as unique, without explicitly
creating an index.

> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)

But doesn't that mean an index is available behind the scenes,
maintained by Postgres? So it should have been used if it would
provide any benefit?

> Adding a unique constraint will automatically create a unique B-tree
> index on the column or group of columns listed in the constraint.

https://www.postgresql.org/docs/9.4/ddl-constraints.html

> You’d probably see a performance improvement were you to create an
> index on (captured_at, id). If your Postgres version is somewhat
> recent, that could even lead to an Index Only Scan.

Tried that with your statement at the bottom and it didn't seem to
change anything even when using Postgres 11:

> ->  Nested Loop  (cost=1.14..343169.49 rows=43543 width=20) (actual time=0.228..95.554 rows=34266 loops=3)
>     ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.119..16.895 rows=34266 loops=3)
>           Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
>     ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>           Index Cond: (id = clt_rec.oms_rec)

vs. with your suggested change:

> ->  Nested Loop  (cost=1.14..513397.11 rows=43543 width=20) (actual time=0.236..97.044 rows=34266 loops=3)
>     ->  Parallel Index Scan using idx_clt_rec_captured_at on clt_rec  (cost=0.57..173665.52 rows=43543 width=24) (actual time=0.183..17.464 rows=34266 loops=3)
>           Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
>     ->  Index Scan using pk_oms_rec on oms_rec  (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
>           Index Cond: (id = clt_rec.oms_rec)

Good news is that having one of those indexes in place at all makes a
huge difference compared to having neither of both. :-D

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@xxxxxxxxxx
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow