I've collected all relevant info(I think so at least) and put it here:
The table in question is used to keep filepath data, of files on a harddrive.
The query in question is used to retrieve items which should be backed up, but have not yet been.
The relevant columns of the table:
Table "public.item"
Column | Type | Collation | Nullable | Default
------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
id | bigint | | not null | nextval('item_id_seq'::regclass)
shouldbebackedup | boolean | | not null | true
backupperformed | boolean | | not null | false
itemCreated | timestamp without time zone | | | now()
filepath | text | | |
The existing index, which no longer gets used:
"index_in_question" btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false
The new index, made out of the exact same columns and conditions, get used immediately after creation:
CREATE INDEX CONCURRENTLY index_test ON item USING btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false;
The query in question will look something like this:
select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;
Having done a count, there are around 13000 items here, without the offset and limit.
That being said, the amount is entirely dependant on what was added on a previous day.
I tried creating an extended statistic, like this, but it had no effect:
CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed FROM item;
Settings from the conf file I think are related:
shared_buffers = 1024MB
effective_cache_size = 2048MB
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 32MB
Finally, I state again that this database gets a nightly "vacuum analyze".
My thanks for looking at this and any suggestions one might have.
Regards,
Koen
The table in question is used to keep filepath data, of files on a harddrive.
The query in question is used to retrieve items which should be backed up, but have not yet been.
The relevant columns of the table:
Table "public.item"
Column | Type | Collation | Nullable | Default
------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
id | bigint | | not null | nextval('item_id_seq'::regclass)
shouldbebackedup | boolean | | not null | true
backupperformed | boolean | | not null | false
itemCreated | timestamp without time zone | | | now()
filepath | text | | |
The existing index, which no longer gets used:
"index_in_question" btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false
The new index, made out of the exact same columns and conditions, get used immediately after creation:
CREATE INDEX CONCURRENTLY index_test ON item USING btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false;
The query in question will look something like this:
select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;
Having done a count, there are around 13000 items here, without the offset and limit.
That being said, the amount is entirely dependant on what was added on a previous day.
I tried creating an extended statistic, like this, but it had no effect:
CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed FROM item;
Settings from the conf file I think are related:
shared_buffers = 1024MB
effective_cache_size = 2048MB
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 32MB
Finally, I state again that this database gets a nightly "vacuum analyze".
My thanks for looking at this and any suggestions one might have.
Regards,
Koen
On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Adrian Klaver <adrian.klaver@xxxxxxxxxxx> writes:
> On 6/4/20 9:43 AM, Tom Lane wrote:
>> It's possible that the index had bloated to the point where the planner
>> thought it was cheaper to use a seqscan. Did you make a note of the
>> cost estimates for the different plans?
> I missed the part where the OP pointed to a SO question. In that
> question where links to explain.depesz.com output.
Ah, I didn't bother to chase that link either.
So the cost estimates are only a fraction of a percent apart, making
it unsurprising for not-so-large changes in the index size to cause
a flip in the apparently-cheapest plan. The real question then is
why the cost estimates aren't actually modeling the real execution
times very well; and I'd venture that that question boils down to
why is this rowcount estimate so far off:
> -> Parallel Seq Scan on oscar mike_three
> (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
> time=159.800..158018.961 rows=23586 loops=3)
> Filter: (four AND (NOT bravo) AND (zulu <=
> 'echo'::timestamp without time zone))
> Rows Removed by Filter: 8610174
We're not going to be able to answer that if the OP doesn't wish
to decloak his data a bit more ... but a reasonable guess is that
those filter conditions are correlated. With late-model Postgres
you might be able to improve matters by creating extended statistics
for this table.
regards, tom lane