Search Postgresql Archives

Re: Auto-analyse on insert operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/06/2015 09:13 AM, Bertrand Roos wrote:


Le 04/11/2015 16:57, Adrian Klaver a écrit :
On 11/04/2015 07:43 AM, Bertrand Roos wrote:


Le 04/11/2015 14:55, Bill Moran a écrit :
On Wed, 4 Nov 2015 14:32:37 +0100
Bertrand Roos <bertrand.roos@xxxxxxxx> wrote:
I try to configure auto-analyse task with postgresql 9.4.
I have the following configuration (default configuration):
track_counts = on
autovacuum = on
log_autovacuum_min_duration = -1
autovacuum_max_workers = 3
autovacuum_naptime = 300s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.2
autovacuum_freeze_max_age = 200000000
autovacuum_multixact_freeze_max_age = 400000000
autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = -1

With this configuration, I can observe that some tables are
auto-analysed, but some others are not. Even if there are millions of
insert operations on an empty table (all tables are in cluster mode).
In fact it seems that tables with update operations are the only ones
that are auto-analysed.
I'm quite suprised because the documentation says that daemon check
the
count of insert, update and delete operations.
What could it be the reason ? Why tables which have only update
operation, aren't analysed ?
Are update operations really taken into account ?
Given that autoanalyze is pretty critical to the way the system
functions,
it's unlikely that it just doesn't work (someone else would have
noticed).

A more likely scenario is that you've found some extremely obscure edge
case. If that's the case, you're going to have to give very specific
details as to how you're testing it before anyone is liable to be able
to help you.

I get the impression that you're somewhat new to Postgres, in which
case
it's very likely that the problem is that you're not testing the
situation
correctly. In that case, we're going to need specific details on how
you're
observing that tables are or are not being analysed.

As a wild-guess theory: the process that does the analyze only wakes up
to check tables every 5 minutes (based on the config you show) ...
so are
you doing the inserts then checking the table without leaving enough
time
in between for the system to wake up and notice the change?

Thanks for your answer Bill.
Indeed, I'm pretty new to Postgres and I don't exclude that I'm doing
something wrong. But I did my test on a more than 1 day duration, so
it's not an issue of autovacuum_naptime (I insert 760 lignes each 30
seconds during 36 hours).
I can't give all the details of this test because it is to complicated
with triggers and partman (and your objective is not to solve
configuration issues of others).

In fact, I was telling the question because I have read on some forums
that the auto vacuum deamon only count dead tuple so only update and
delete operations can cause the scheduling of auto-analyse.
So if it's the case it perfectly explain why my test case doesn't work.
But in the other hand the documentation says that delete, update and
insert operations are counted.
Is it an know issue that insert operations are not counted for the
trigger of auto-analyse ?

No, see below:

http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM


23.1.3. Updating Planner Statistics

"The autovacuum daemon, if enabled, will automatically issue ANALYZE
commands whenever the content of a table has changed sufficiently.
However, administrators might prefer to rely on manually-scheduled
ANALYZE operations, particularly if it is known that update activity
on a table will not affect the statistics of "interesting" columns.
The daemon schedules ANALYZE strictly as a function of the number of
rows inserted or updated; it has no knowledge of whether that will
lead to meaningful statistical changes."


Is the partman you refer to this?:

https://github.com/keithf4/pg_partman

Can you give an outline view of what you are doing and how you are
determining the status of analyze?


If it's not, I can try to reproduce this weird behaviour with a simpler
test and give you all the details of the test.

Bertrand






I have done multiple tests trying to reproduce my issue and finally find
a partial explanation.

My non-working use case is :
With a database with configuration by default (autovacuum=on,
autovacuum_max_workers=3 ...)
A table ('table_name') partioned with partman with a child table per
month (@Adrian Klaver, yes it's https://github.com/keithf4/pg_partman)
A remote deamon that calls an analyze every day on the table 'table_name'

So the above is a manual analyze run by another program correct?

A remote application that calls multiple stored procedures doing an
insert operation on 'table_name' (785 inserts each 30s), with no delete,
no update operation

So a batch of 785 inserts over 30 seconds?

This stored procedures are sometimes blocked by a short lock (they are
doing some update operations on a cache table of 'table_name')

I not following the above.

You have another table that 'shadows' the 'table_name' table?


It happened that :
The current child table has no statistics :
- SELECT * FROM pg_stat WHERE tablename = '$table_name_month'; => empty row

You are actually talking about pg_stats above and below correct?

- SELECT * FROM pg_stat_user_tables WHERE relname = '$table_name_month';
=> statistics are good (coherent with values I should have) but I saw no
value for last_autoanalyze or last_analyze (even if the test was running
for a few days and the table start with no data and grow to 6 million rows)

So what user are you running the above as and is it one that has permissions on 'table_name'?


The behaviour of the auto-analyze I have observed :
Insert operations on a table trigger auto-analyze (with track_counts=on).
Stored procedure on a table trigger auto-analyze (even if
track_functions=off)

Well:

http://www.postgresql.org/docs/9.4/static/runtime-config-statistics.html#GUC-TRACK-FUNCTIONS

 track_functions (enum)

Enables tracking of function call counts and time used. Specify pl to track only procedural-language functions, all to also track SQL and C language functions. The default is none, which disables function statistics tracking. Only superusers can change this setting.

So it tracks(or not) function usage not what the function does. If the function is doing INSERT/UPDATE/DELETEs then that is covered by track_counts.

Insert operations on a partitioned table trigger auto-analyze.
Insert operations on a daily analyze table may not trigger auto-analyze,
because an analyze reset the count of operation for auto-analyze.
An analyze on a partitioned table update statistics of the parent table
but not for children tables.
An analyze on a partitioned table do not reset the count of operations
for auto-analyze of children tables.

I do not do enough partitioning to be of much help here. One thing I would say is that it might be helpful to see the schema partman sets up when it does the partitioning.


I may not be able yet to explain the reason why the auto-analyze is
rarely not triggered.
But at least I found a way to fix pg_stat of the current table.

@Bill Moran, Are you sure that a dump of the table could help you ? The
values of the table are correct, the main issue is that statistics are
becoming invalid and so postgres is choosing dummy request paths.

Bertrand





--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux