Recovery will take 10 hours

Brendan Duddridge <brendan@xxxxxxxxxxxxxx> · Thu, 20 Apr 2006 13:29:46 -0600

Hi,

We had a database issue today that caused us to have to restore to  
our most recent backup. We are using PITR so we have 3120 WAL files  
that need to be applied to the database.

After 45 minutes, it has restored only 230 WAL files. At this rate,  
it's going to take about 10 hours to restore our database.

Most of the time, the server is not using very much CPU time or I/O  
time. So I'm wondering what can be done to speed up the process?

The database is about 20 GB. The WAL files are compressed with gzip  
to about 4 MB. Expanded, the WAL files would take 48 GB.

We are using PostgreSQL 8.1.3 on OS X Server 10.4.6 connected to an  
XServe RAID. The pg_xlog is on its own separate RAID and so are the  
table spaces.

Here's a representative sample of doing iostat:

hulk1:/Library/PostgreSQL admin$ iostat 5
          disk1           disk2           disk0       cpu
  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us sy id
19.31 101  1.91  14.39  51  0.71  37.37   4  0.13  15 10 76
  8.00  21  0.16   0.00   0  0.00  90.22   2  0.16   0  2 98
  8.00  32  0.25   0.00   0  0.00   0.00   0  0.00   0  1 98
  8.00  76  0.60   0.00   0  0.00   0.00   0  0.00   0  1 99
  8.00 587  4.59  1024.00   4  4.00   0.00   0  0.00   4  7 88
  8.00 675  5.27  956.27   6  5.60   0.00   0  0.00   6  6 88
11.32 1705 18.84   5.70   1  0.01  16.36   7  0.12   1  6 93
  8.00  79  0.62  1024.00   3  3.20   0.00   0  0.00   2  2 96
  8.00  68  0.53   0.00   0  0.00   0.00   0  0.00   0  2 98
  8.00  76  0.59   0.00   0  0.00   0.00   0  0.00   0  1 99
  8.02  89  0.69   0.00   0  0.00   0.00   0  0.00   1  1 98
  8.00 572  4.47  911.11   4  3.20   0.00   0  0.00   5  5 91
13.53 1227 16.21  781.55   4  3.21  12.14   2  0.03   3  6 90
  8.00  54  0.42   0.00   0  0.00  90.22   2  0.16   1  1 98
  8.00  68  0.53   0.00   0  0.00   0.00   0  0.00   0  1 99
  8.00 461  3.60  1024.00   3  3.20   0.00   0  0.00   3  6 91
  8.00 671  5.24  964.24   7  6.40   0.00   0  0.00   6  8 86
  7.99 248  1.94   0.00   0  0.00   0.00   0  0.00   1  3 96
15.06 1050 15.44  911.11   4  3.20  12.12   3  0.03   2  5 93
19.84 176  3.41   5.70   1  0.01   0.00   0  0.00   0  1 99

disk1 is the RAID volume that has the table spaces on it. disk2 is  
the pg_xlog and disk0 is the boot disk.

So you can see the CPU is idle much of the time and the IO only  
occurs in short bursts. Each line in the iostat results is 5 seconds  
apart.

If there were something we could do to speed up the process, would it  
be possible to kill the postgres process, tweak some parameter  
somewhere and then start it up again? Or would we have to restore our  
base backup again and start over?

How can I make this go faster?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@xxxxxxxxxxxxxx

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com