Search Postgresql Archives

Re: Index no longer being used, destroying and recreating it restores use.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/5/20 7:05 AM, Koen De Groote wrote:
I've collected all relevant info(I think so at least) and put it here:

The table in question is used to keep filepath data, of files on a harddrive. The query in question is used to retrieve items which should be backed up, but have not yet been.

The relevant columns of the table:

                                                     Table "public.item"
               Column               |            Type             | Collation | Nullable |                   Default
------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
 id                                 | bigint                      |       | not null | nextval('item_id_seq'::regclass)  shouldbebackedup                   | boolean                     |       | not null | true  backupperformed                    | boolean                     |       | not null | false  itemCreated                        | timestamp without time zone |       |          | now()  filepath                           | text                        |       |          |


The existing index, which no longer gets used:
"index_in_question" btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false

The new index, made out of the exact same columns and conditions, get used immediately after creation: CREATE INDEX CONCURRENTLY index_test ON item USING btree (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed = false;


The query in question will look something like this:
select * from item where shouldbebackedup=true and itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by filepath asc, id asc limit 100 offset 10400;

The result of EXPLAIN ANALYZE for above.


Having done a count, there are around 13000 items here, without the offset and limit. That being said, the amount is entirely dependant on what was added on a previous day.


I tried creating an extended statistic, like this, but it had no effect:
CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed FROM item;

Settings from the conf file I think are related:

shared_buffers = 1024MB
effective_cache_size = 2048MB
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 32MB

Finally, I state again that this database gets a nightly "vacuum analyze".

My thanks for looking at this and any suggestions one might have.

Regards,
Koen

On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@xxxxxxxxxxxxx <mailto:tgl@xxxxxxxxxxxxx>> wrote:

    Adrian Klaver <adrian.klaver@xxxxxxxxxxx
    <mailto:adrian.klaver@xxxxxxxxxxx>> writes:
     > On 6/4/20 9:43 AM, Tom Lane wrote:
     >> It's possible that the index had bloated to the point where the
    planner
     >> thought it was cheaper to use a seqscan.  Did you make a note of the
     >> cost estimates for the different plans?

     > I missed the part where the OP pointed to a SO question. In that
     > question where links to explain.depesz.com
    <http://explain.depesz.com> output.

    Ah, I didn't bother to chase that link either.

    So the cost estimates are only a fraction of a percent apart, making
    it unsurprising for not-so-large changes in the index size to cause
    a flip in the apparently-cheapest plan.  The real question then is
    why the cost estimates aren't actually modeling the real execution
    times very well; and I'd venture that that question boils down to
    why is this rowcount estimate so far off:

     >                ->  Parallel Seq Scan on oscar mike_three
     > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
     > time=159.800..158018.961 rows=23586 loops=3)
     >                        Filter: (four AND (NOT bravo) AND (zulu <=
     > 'echo'::timestamp without time zone))
     >                        Rows Removed by Filter: 8610174

    We're not going to be able to answer that if the OP doesn't wish
    to decloak his data a bit more ... but a reasonable guess is that
    those filter conditions are correlated.  With late-model Postgres
    you might be able to improve matters by creating extended statistics
    for this table.

                             regards, tom lane



--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux