Re: Some observations on very slow pg_restore operations

MichaelDBA <MichaelDBA@xxxxxxxxxxx> · Thu, 3 Oct 2019 17:48:18 -0400

Hi Ogden,

You didn't mention any details about your postgresql.conf settings.  Why
 don't you set them optimally for your loads and try again and see if 
there is any difference.  Make sure you do a DB restart since some of 
these parameters require it.

======================================
parameter		before	after
----------------	------	-------
shared_buffers		Reduce this value to about 25% of total memory
temp_buffers		Decrease this value to 8MB since we are not using temporary tables or doing intermediate sorts
work_mem		Reduce significantly (1MB) since we are not doing memory sorts or hashes per SQL
maintenance_work_mem	Increase signficantly for DDL bulk loading, restore operations
fsync    		off (so that time is not being spent waiting for stuff to be written to disk). Note: you may not be able to recover your database after a crash when set to off.
checkpoint_segments	Increase this significantly for DML bulk loading, restore operations
max_wal_size            Increase significantly like you would to checkpoint_segments
min_wal_size            Increase significantly like you would to checkpoint_segments
checkpoint_timeout      Increase to at least 30min
archive_mode 		off
autovacuum 		off
synchronous_commit	off
wal_level		minimal
max_wal_senders         0
full_page_writes        off during DML bulk loading, restore operations
wal_buffers             16MB during DML bulk loading, restore operations 

Regards,

Michael Vitale

Ogden Brash wrote on 10/3/2019 4:30 PM:

  I recently 
performed a pg_dump (data-only) of a relatively large database where we 
store intermediate results of calculations. It is approximately 3 TB on 
disk and has about 20 billion rows.

We do the 
dump/restore about once a month and as the dataset has grown, the 
restores have gotten very slow. So, this time I decided to do it a 
different way and have some observations that puzzle me.

Background:

The data 
is extremely simple. The rows consist only of numbers and are all fixed 
length. There are no foreign keys, constraints, null values, or default 
values. There are no strings or arrays. There are 66 tables and the 
number of rows in each table forms a gaussian distribution; so there are
 3 tables which have about 3 billion rows each and the rest of the 
tables have significantly fewer rows. 

I used the
 directory format when doing the pg_dump. The compressed data of the 
dump is 550 GB.

I am using: (PostgreSQL) 11.5 (Ubuntu 
11.5-1.pgdg18.04+1)

The machine that I attempted to do a pg_restore to is a
 dedicated server just for one instance of posgresql. It has 32 GB of 
memory and is running Ubuntu 18.04 (headless). It physical hardware, not
 virtualized. Nothing else runs on the machine and the postgresql.conf 
settings have been tuned (to the best of my postgresql abilities which 
are suspect). While the operating system is installed on an SSD, there 
is one extra large, fast HDD that is dedicated to the posgresql server. 
It has been in use for this particular purpose for a while and has not 
had performance issues. (Just with pg_restore)

Autovacuum
 is off and all indexes have been deleted before the restore is started.
 There is nothing in the db except for the empty data tables.

Restoring
 over the net:

In the past we have always restored in a way where the
 dumped data is read over a gigabit connection while being restored to 
the local drive. But, the last time we did it it took 2 days and I was 
looking for something faster. So, I decided to copy the dumped directory
 to the local drive and restore from the dump locally. I knew that 
because the machine only had one drive that would fit the data, there 
would be some I/O contention, but I hoped that it might not be as bad as
 reading over the network.

The pg_restore went unbearably slowly... after many 
hours it had written less than 20GB to the database, so I started 
tracking it with iostat to see what was going on. The following is 
iostat output every 60 seconds. I tracked it for several hours and this 
is representative of what was happening consistently.

avg-cpu:  %user   %nice %system %iowait  %steal  
 %idle
           0.39    0.00   
 0.40   43.10    0.00   56.11

Device             tps    kB_read/s    kB_wrtn/s    
kB_read    kB_wrtn
loop0         
    0.00         0.00         0.00          0          0
loop1             0.00         0.00         0.00       
   0          0
loop2           
  0.00         0.00         0.00          0          0
sda             263.33       132.87      2990.93      
 7972     179456
sdb             
  0.17         0.00         0.73          0         44

avg-cpu: 
 %user   %nice %system %iowait  %steal   %idle
           0.34    0.00    0.41   44.43    0.00   54.82

Device   
          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00       
   0          0
loop1           
  0.00         0.00         0.00          0          0
loop2             0.00         0.00         0.00       
   0          0
sda            
 262.95       140.47      2983.00       8428     178980
sdb               0.08         0.00         0.40       
   0         24

While I was tracking this I started experimenting with
 the IO scheduler to see if it had a noticable impact. I had been using 
cfq (ubuntu 18.04 default). Changing to deadline did not have a 
noticable difference. Changing to noop made things much slower. I went 
back to cfq. I also experimented with turning fsync off; that did speed 
things up a bit but not enough for me to leave it off.

What 
puzzled me is that the OS was spending such a large percentage of time 
in iowait, yet there was so little IO going on.

So, I 
decided to go back to restoring over the net. While the slow pg_restore 
was still going on, and while I was still tracking iostat, I copied the 
550 GB dumps to an nfs drive. The copy happened pretty much at full 
speed (limit being the gigabit ethernet) and interestingly, it did not 
slow down kb_wrtn and kb_wrtn/s numbers in iostat (which was the 
postgresql server continuing with the restore). To me that seemed to 
indicate that it was not really a disk I/O limitation.

Restoring
 over the net:

After copying the dump files to an NFS drive, I 
stopped the restore, truncated the tables and started exactly the same 
command, but this time taking its input from the nfs drive. I did not 
reboot the machine or restart the postgresql server. I tracked iostate 
every 60 seconds and this is what it looks like:

avg-cpu:  
%user   %nice %system %iowait  %steal   %idle
           8.87    0.00    1.62   39.89    0.00   49.61

Device   
          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00       
   0          0
loop1           
  0.00         0.00         0.00          0          0
loop2             0.00         0.00         0.00       
   0          0
sda            
 252.77       527.87     37837.47      31672    2270248
sdb               0.22         0.00         1.00       
   0         60

avg-cpu:  %user   %nice %system %iowait  %steal  
 %idle
           8.57    0.00   
 2.21   35.26    0.00   53.97

Device             tps    kB_read/s    kB_wrtn/s    
kB_read    kB_wrtn
loop0         
    0.00         0.00         0.00          0          0
loop1             0.00         0.00         0.00       
   0          0
loop2           
  0.00         0.00         0.00          0          0
sda             236.10       465.27     54312.00      
27916    3258720
sdb             
  0.08         0.00         0.40          0         24

Notice that the database is writing approximately 15 
times as fast (and I have verified that by tracking the size of the 
posgresql data directory over time) while the number of i/o transactions
 per second has actually dropped a little bit. It has now been running 
about 24 hours and has maintained that speed.

My 
interpretation

At first sight this seems to me as being symptomatic 
of the pg_restore process doing a huge number of very small input 
operations when reading from the dump. If the proportion of input to 
output operations is the same now as it was when trying to restore from 
the local drive, that implies that the vast majority of i/o operations 
were inputs and not outputs.

However, I am not sure that even that would cause such
 a slowdown because the compressed data files in the directory format 
dump correspond to the tables and so there are 3 very large files that 
it starts with. So all of these stats were gathered in the first 24 
hours of the restore when it was just restoring the first 3 tables (I 
have verbose on, so I know). Because those files are gzipped, we know 
that they are being read sequentially and because the machine has lots 
of memory we know that the OS has allocated a lot of space to disk 
buffers and so even if postgresql was doing lots of small reads, 
bouncing around between the 3 files, it would not hit the disk that 
often.

Now that restore is happening 15 times faster when 
reading from an nfs drive, I looked at the nfsiostat output for a while 
and it does not show any indication of any untoward behavior:

192.168.17.146:/volume1/Backups mounted on 
/nas/Backups:

           ops/s       rpc bklog
          27.000           0.000

read:     
         ops/s            kB/s           kB/op         retrans    avg 
RTT (ms)    avg exe (ms)

           27.000        3464.332         128.309        0 (0.0%)       
   13.500          13.511
write: 
            ops/s            kB/s           kB/op         retrans    avg
 RTT (ms)    avg exe (ms)

            0.000           0.000           0.000        0 (0.0%)       
    0.000           0.000

192.168.17.146:/volume1/Backups mounted on 
/nas/Backups:

           ops/s       rpc bklog
          24.000           0.000

read:     
         ops/s            kB/s           kB/op         retrans    avg 
RTT (ms)    avg exe (ms)

           24.000        3079.406         128.309        0 (0.0%)       
   28.492          28.504
write: 
            ops/s            kB/s           kB/op         retrans    avg
 RTT (ms)    avg exe (ms)

            0.000           0.000           0.000        0 (0.0%)       
    0.000           0.000

The 
nubmer of operations per second (if those correspond to reads from 
postgresql, which I do not know for a fact) does not seem high at all. 

I 
actually do not have a great theory for what is going on but it might be
 more obvious to someone who knows the postgresql implementation well. I
 would love to hear any thoughts that would be helpful on how to get my 
restores even faster.