Re: Restoring from PostgreSQL 9.5 dump to 10 is super slow

Pavan Teja <pavan.postgresdba@xxxxxxxxx> · Mon, 6 May 2019 16:58:36 +0530

Hi Siddharth,
The slowness also depends on the data with its type(data type).

Typically it depends based on the amount of ram distributed between other processes and that of pg_restore.

Regards, 
Pavan 

On Mon, 6 May, 2019, 4:54 PM Siddharth Karandikar, <siddharth.karandikar@xxxxxxxxx> wrote:
Hello All,

This is my first and little longer email on this list. I wanted to

share all the information that I have gathered so far and it got

longer.

I am trying to restore database dump created on PostgreSQL 9.5.16 to

PostgreSQL 10.7 and it is taking a lot of time to restore. Restoring

the same dump to 9.5 doesn't take that much. So I am wondering what

could be reason behind this slowness.

Some numbers to show how slow it is:

Small tables (<500MB):

  - 9.5 to 9.5 took 1m4.110s

  - 9.5 to 10 took 1m4.575s

Medium tables (500MB to 5GB):

  - 9.5 to 9.5 took 6m53.132s

  - 9.5 to 10 took 6m0.998s

Large tables (5GB+):

  - 9.5 to 9.5 took 131m35.106s

  - 9.5 to 10 took 831m25.539s

Underlying hardware and OS used in both cases are exactly the same.

  Thinking that there could be data format difference between Postgres

9.5 and 10 and that could be the reason to this slowness, I tried

dumping from Postgres 10 and restoring it on Postgres 10, but that is

also super slow.

Here are some numbers of that:

Small tables (<500MB):

  - 10 to 10 took 0m51.248s

Medium tables (500MB to 5GB):

  - 10 to 10 took 5m40.224s

Large tables (5GB+):

  - 10 to 10 took 832m36.850s

Restore commands

The commands I am using to restore:

  pg_restore -v -F d -n public -U <user> -d <database> -h localhost -j

4 --disable-triggers -e -L <file with table names>

/var/lib/pgsql/20190425/pg_dump_data

The hardware I am running on:

CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz (2 cores)

Memory: 16GB

HDD: nvme 300GB

FS: ext4

Postgres configuration that I have on this setup:

    shared_buffers = 128MB

    work_mem             = 1MB

    maintenance_work_mem = 16MB

    dynamic_shared_memory_type = posix

    huge_pages     = try

    vacuum_cost_page_hit   = 1

    vacuum_cost_page_miss  = 10

    vacuum_cost_page_dirty = 20

    bgwriter_lru_maxpages   = 100

    full_page_writes = on

    seq_page_cost        = 1.0

    random_page_cost     = 4.0

    wal_level = minimal

    wal_sync_method = fsync

    wal_log_hints    = off

    wal_buffers      = -1

    wal_writer_delay = 200ms

    min_wal_size                 = 1GB

    max_wal_size                 = 80GB

    max_wal_senders    = 0

    wal_keep_segments  = 0

    wal_sender_timeout = 60s

    wal_receiver_status_interval = 10s

    wal_receiver_timeout = 60s

    autovacuum = on

    log_autovacuum_min_duration = -1

    autovacuum_max_workers = 3

    autovacuum_naptime = 1min

    autovacuum_vacuum_threshold = 50

    autovacuum_analyze_threshold = 50

    autovacuum_vacuum_scale_factor = 0.2

    autovacuum_analyze_scale_factor = 0.1

    autovacuum_freeze_max_age = 200000000

    autovacuum_multixact_freeze_max_age = 400000000

    autovacuum_vacuum_cost_delay = 20ms

    autovacuum_vacuum_cost_limit = -1

Some more things observed:

* Running strace on Postgres process showed continuous stream of

lseek/read/write calls and it also showed bunch of futex wait and

resume calls. These futex calls are not easily seen on Postgres 9.5

* Looking at pg_stat_activity table showed WALWriteLock under

wait_event column. It used to change but could see this coming up very

frequently.

Thanks,

Siddharth Karandikar

PS: I had posted this on stackoverflow, but didn't get lot of

attention there. So posting it here. Don't know if thats considered

cross-posting. But if thats the case, I would keep that in mind and

avoid it in future.