Search Postgresql Archives

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark <mwchambers@xxxxxxxxx> writes:
> I have a long running job which deletes all of the common_student table and
> then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.

Hmm ... I'm not sure if that will actually make things better.  The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:

                     * We count delete-in-progress rows as still live, using
                     * the same reasoning given above; but we don't bother to
                     * include them in the sample.

The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:

                     * Insert-in-progress rows are not counted.  We assume
                     * that when the inserting transaction commits or aborts,
                     * it will send a stats message to increment the proper
                     * count.  This works right only if that transaction ends
                     * after we finish analyzing the table; if things happen
                     * in the other order, its stats update will be
                     * overwritten by ours.  However, the error will be large
                     * only if the other transaction runs long enough to
                     * insert many tuples, so assuming it will finish after us
                     * is the safer option.

Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample.  So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit.  This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.

In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included.  And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.

In short, I think we want to do this to the DELETE_IN_PROGRESS logic:

                    if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
                        deadrows += 1;
                    else
+                   {
+                       sample_it = true;
                        liverows += 1;
+                   }

with suitable adjustment of the adjacent comment.

Thoughts?

            regards, tom lane





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux