So, this query:
select * from item where shouldbebackedup=true and
itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false
order by filepath asc, id asc limit 100 offset 10400;
Was made into a function:
create or replace function NeedsBackup(text, int, int default 100)
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and itemCreated<=$1::timestamp without time zone and backupperformed=false order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';
returns setof item as $$
BEGIN
return query select * from item where shouldbebackedup=true and itemCreated<=$1::timestamp without time zone and backupperformed=false order by filepath asc, id asc limit $3 offset $2;
END;
$$
language 'plpgsql';
Having read the documentation, I do think this is STABLE and not VOLATILE? While the external process performing the query will eventually update these rows, setting backupperformed to true instead of false, that doesn't happen within this function.
Doing an EXPLAIN ANALYZE yields this output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Function Scan on public.needsbackup (cost=0.25..10.25 rows=1000 width=5275) (actual time=80.107..80.115 rows=100 loops=1)
Output: id, shouldbebackedup, backupperformed, itemCreated, filepath
Function Call: needsbackup('2020-06-08 12:00:00.016'::text, 2100, 100)
Buffers: shared read=1572
Planning Time: 0.199 ms
Execution Time: 80.499 ms
(6 rows)
---------------------------------------------------------------------------------------------------------------------------
Function Scan on public.needsbackup (cost=0.25..10.25 rows=1000 width=5275) (actual time=80.107..80.115 rows=100 loops=1)
Output: id, shouldbebackedup, backupperformed, itemCreated, filepath
Function Call: needsbackup('2020-06-08 12:00:00.016'::text, 2100, 100)
Buffers: shared read=1572
Planning Time: 0.199 ms
Execution Time: 80.499 ms
(6 rows)
And the actual output is identical to the full query.
I'm kind of worried I'm not seeing any more details as to how it all worked. Yes, it's as fast as I hoped, but that also happened last time when I re-created the index. Fast for a few weeks, then suddenly not anymore.
I tried looking up how to create indexes on functions, but found nothing that could help, also the suggestion that this is not possible.
So, at this point, what should I still look at, I wonder? EXPLAIN says it did a function call, but what happened under the hood there?
And before I forget: thank you most kindly for the advice so far, to all involved.
Regards,
Koen
On Sun, Jun 7, 2020 at 12:45 AM Koen De Groote <kdg.dev@xxxxxxxxx> wrote:
I'll attempt this next week.On Fri, Jun 5, 2020, 21:11 Michael Lewis <mlewis@xxxxxxxxxxx> wrote:Those row estimates are pretty far off.
Standard indexes and partial indexes don't get custom statistics created on them, but functional indexes do. I wonder if a small function needs_backup( shouldbebackedup, backupperformed ) and an index created on that function would nicely alleviate the pain. I would expect PG12 and extended statistics of type MCVs would bypass the need for that work around though.