Re: Too long startup time after each crash.

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 21 Dec 2016 17:31:00 -0800

On 12/21/2016 04:16 PM, neos@xxxxxxxxxxxx wrote:

22.12.2016, 04:21, "Adrian Klaver" <adrian.klaver@xxxxxxxxxxx>:
On 12/21/2016 02:17 PM, neos@xxxxxxxxxxxx wrote:
 Hi.

 Perhaps i've sent it to wrong mail-list previously.

 After the upgrade postgresql had a strange behavior. After the crash, server launch now take a long time, up to 30-40 minutes sometimes. When connecting to the server (as well as in the log), the message - "the database system is starting up". Startup process consumes 0.4 - 1% CPU.

What upgrade and how was it done?

By standard distro method (emerge in my case), and then pg_upgrade utility (according to pg docs). Upgrade was from 9.4 to 9.5, and now 9.5.3 to 9.6.1.

What crash and what caused it?

In particular why are you seeing multiple crashes?

Sorry, I did not specify a crash, i mean technical failure on the server - power failure or sometimes CPU overheating ("home" server problem).

What is the log just before the crash?

Nothing, as expected, since it's not postgresql crash.

What is the log after the the "database system is starting up" part?

First - "the database system was interrupted; last known up at..."

then many times "database system is starting up" (one per connection attempt)

and after 5-40 min i get - database system was not properly shut down; automatic recovery in progress...redo starts from...ready to accept connections, and, finally, everything is going well.

Alright looks like it doing the correct thing.

Now if I am following you say you see the issue starting with 9.5+. As 
it so happens that is when checkpoint_segments was replaced with 
max_wal_size:

https://www.postgresql.org/docs/9.5/static/release-9-5.html

"

Replace configuration parameter checkpoint_segments with min_wal_size 
and max_wal_size (Heikki Linnakangas)

If you previously adjusted checkpoint_segments, the following formula 
will give you an approximately equivalent setting:

max_wal_size = (3 * checkpoint_segments) * 16MB

Note that the default setting for max_wal_size is much higher than the 
default checkpoint_segments used to be, so adjusting it might no longer 
be necessary.
"

If you had not changed the checkpoint_settings in your 9.4-  servers it 
would have been set at 3 or roughly 48 MB of WALs. In 9.5 max_wal_size 
is set at 1GB. In either case the default  checkpoint_timeout would be 5 
minutes.

Did you change the checkpoint_timeout setting or is you machine 
generating something close to 1GB of WAL files before the 5 minutes?

Either case could lead to longer start up times as Postgres would have 
to process more WAL files.

There is also your checkpoint_completion_target = 0.85. The default is 
0.5. Per the docs below:
"The disadvantage of this is that prolonging checkpoints affects 
recovery time, because more WAL segments will need to be kept around for 
possible use in recovery."

For more information see 9.4-:

https://www.postgresql.org/docs/9.4/static/wal-configuration.html

9.5+:

https://www.postgresql.org/docs/9.5/static/wal-configuration.html

(my logs in ru_RU locale, so do not place them here)

 Tried different version 9.5.[2,3,4], 9.6.1 (9.4 works fine). On the Internet this problem is not met, perhaps the problem lies in a combination of different factors (configuration database, Linux kernel), may be my config is wrong?

Do you mean you searched for this problem description and could not find it?

Yes, it is, my fault, Google translator is not always saving me from mistakes.

 Technical details:

 Gentoo Linux 4.4.2-hardened
 current pg version 9.6.1

 ## pg config:

 listen_addresses = '*'
 port = 5432
 max_connections = 500
 shared_buffers = 2048MB
 temp_buffers = 16MB
 work_mem = 64MB
 maintenance_work_mem = 512MB
 max_stack_depth = 4MB
 dynamic_shared_memory_type = posix
 wal_level = replica
 wal_buffers = 8MB
 checkpoint_completion_target = 0.85
 archive_mode = on
 archive_command = 'gzip < %p > /home/pgbasebackup/xlogs/%f.gz'
 max_wal_senders = 3
 effective_cache_size = 9GB
 log_min_duration_statement = 1500
 log_timezone = 'Asia/Yekaterinburg'
 datestyle = 'iso, dmy'
 timezone = 'Asia/Yekaterinburg'
 lc_messages = 'ru_RU.UTF-8'
 lc_monetary = 'ru_RU.UTF-8'
 lc_numeric = 'ru_RU.UTF-8'
 lc_time = 'ru_RU.UTF-8'
 default_text_search_config = 'pg_catalog.russian'
 plperl.on_init = 'use utf8; use re; package utf8; require "utf8_heavy.pl";'

 ## end of config

 Sorry for my eng :)

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general