Re: Index no longer being used, destroying and recreating it restores use.

Koen De Groote <kdg.dev@xxxxxxxxx> · Fri, 5 Jun 2020 17:34:34 +0200

The requested result: https://explain.depesz.com/s/G7mU

Also, the data from the statistic itself:

=> SELECT stxname, stxkeys, stxdependencies                                                  
->   FROM pg_statistic_ext                                                                   
->   WHERE stxname = 's1';
 stxname | stxkeys | stxdependencies 
---------+---------+-----------------
 s1      | 29 35   | <NULL>

On Fri, Jun 5, 2020 at 4:15 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 6/5/20 7:05 AM, Koen De Groote wrote:

> I've collected all relevant info(I think so at least) and put it here:

> 

> The table in question is used to keep filepath data, of files on a 

> harddrive.

> The query in question is used to retrieve items which should be backed 

> up, but have not yet been.

> 

> The relevant columns of the table:

> 

>                                                      Table "public.item"

>                 Column               |            Type             | 

> Collation | Nullable |                   Default

> ------------------------------------+-----------------------------+-----------+----------+----------------------------------------------

>   id                                 | bigint                      |     

>        | not null | nextval('item_id_seq'::regclass)

>   shouldbebackedup                   | boolean                     |     

>        | not null | true

>   backupperformed                    | boolean                     |     

>        | not null | false

>   itemCreated                        | timestamp without time zone |     

>        |          | now()

>   filepath                           | text                        |     

>        |          |

> 

> 

> The existing index, which no longer gets used:

> "index_in_question" btree (shouldbebackedup, backupperformed, 

> itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed 

> = false

> 

> The new index, made out of the exact same columns and conditions, get 

> used immediately after creation:

> CREATE INDEX CONCURRENTLY index_test ON item USING btree 

> (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE 

> shouldbebackedup = true AND backupperformed = false;

> 

> 

> The query in question will look something like this:

> select * from item where shouldbebackedup=true and 

> itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order 

> by filepath asc, id asc limit 100 offset 10400;

The result of EXPLAIN ANALYZE for above.

> 

> Having done a count, there are around 13000 items here, without the 

> offset and limit.

> That being said, the amount is entirely dependant on what was added on a 

> previous day.

> 

> 

> I tried creating an extended statistic, like this, but it had no effect:

> CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed 

> FROM item;

> 

> Settings from the conf file I think are related:

> 

> shared_buffers = 1024MB

> effective_cache_size = 2048MB

> random_page_cost = 1.1

> effective_io_concurrency = 200

> work_mem = 32MB

> 

> Finally, I state again that this database gets a nightly "vacuum analyze".

> 

> My thanks for looking at this and any suggestions one might have.

> 

> Regards,

> Koen

> 

> On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@xxxxxxxxxxxxx 

> <mailto:tgl@xxxxxxxxxxxxx>> wrote:

> 

>     Adrian Klaver <adrian.klaver@xxxxxxxxxxx

>     <mailto:adrian.klaver@xxxxxxxxxxx>> writes:

>      > On 6/4/20 9:43 AM, Tom Lane wrote:

>      >> It's possible that the index had bloated to the point where the

>     planner

>      >> thought it was cheaper to use a seqscan.  Did you make a note of the

>      >> cost estimates for the different plans?

> 

>      > I missed the part where the OP pointed to a SO question. In that

>      > question where links to explain.depesz.com

>     <http://explain.depesz.com> output.

> 

>     Ah, I didn't bother to chase that link either.

> 

>     So the cost estimates are only a fraction of a percent apart, making

>     it unsurprising for not-so-large changes in the index size to cause

>     a flip in the apparently-cheapest plan.  The real question then is

>     why the cost estimates aren't actually modeling the real execution

>     times very well; and I'd venture that that question boils down to

>     why is this rowcount estimate so far off:

> 

>      >                ->  Parallel Seq Scan on oscar mike_three

>      > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual

>      > time=159.800..158018.961 rows=23586 loops=3)

>      >                        Filter: (four AND (NOT bravo) AND (zulu <=

>      > 'echo'::timestamp without time zone))

>      >                        Rows Removed by Filter: 8610174

> 

>     We're not going to be able to answer that if the OP doesn't wish

>     to decloak his data a bit more ... but a reasonable guess is that

>     those filter conditions are correlated.  With late-model Postgres

>     you might be able to improve matters by creating extended statistics

>     for this table.

> 

>                              regards, tom lane

> 

-- 

Adrian Klaver

adrian.klaver@xxxxxxxxxxx