Re: Migration study, step 1: bulk write performance optimization

Dave Cramer <pg@xxxxxxxxxxxxx> · Mon, 20 Mar 2006 12:44:31 -0500

Others are reporting better performance on 8.1.x with very large  
shared buffers. You may want to try tweaking that possibly as high as  
20% of available memory

Dave
On 20-Mar-06, at 9:59 AM, Mikael Carneholm wrote:

Ok, here's the deal:

I am responisble for an exciting project of evaluating migration of  
a medium/large application for a well-known swedish car&truck  
manufacturer from a proprietary DB to Postgres. The size of the  
database is currently about 50Gb, annual growth depending on sales,  
but probably in the 30-50Gb range.

Migrating the schema was easily done, mostly involving a search/ 
replace of some vendor specific datatypes. The next step is to  
migrate the data itself, and for this we have written a Java app  
relying on JDBC metadata to map the tables in the source schema to  
the target schema. The goal right now is to find the set of  
parameters that gives as short bulk insert time as possible,  
minimizing downtime while the data itself is migrated.

The machine used for the study is a Dell PE2850, 6GB memory, 1xXEON  
3.0GHz/2MB cache, internal SCSI 0+1 raid (currently 4x36GB 10000rpm  
striped+mirrored, two more 146GB 15000rpm disks will arrive later).  
Not sure about the brand/model of the raid controller, so I'll  
leave that for now. File system is ext3(I know, maybe not the  
optimal choice but this is how it was when I got it) with a 8k  
block size. The OS currently installed is CentOS4.

Until the new disks arrive, both the OS itself, pg_xlog and the  
data reside on the same disks. When they arrive, I will probably  
move the data to the new disks (need two more to get raid 0+1,  
though) and leave the OS + pg_xlog on the 10000rpm disks. Mounting  
the 15000rpm data disks with the noatime option (this is safe,  
right?) and using a 16kb block size (for read performance) will  
probably be considered as well.

NOTE: this machine/configuration is NOT what we will be using in  
production if the study turns out OK, it's just supposed to work as  
a development machine in the first phase whose purpose more or less  
is to get familiar with configurating Postgres and see if we can  
get the application up and running (we will probably use a 64bit  
platform and either a FC SAN or internal raid with a battery backed  
cache for production use, if all goes well).

The first thing I did when I got the machine was to do a raw dd  
write test:

# time bash -c "(dd if=/dev/zero of=/opt/bigfile count=1310720  
bs=8k && sync)"
1310720+0 records in
1310720+0 records out

real    2m21.438s
user    0m0.998s
sys     0m51.347s

(10*1024)Mb/~141s => ~75.5Mb/s

As a simple benchmark, I created a simple table without PK/indexes  
with 1k wide rows:

create table iotest.one_kb_rows
(
        the_col char(1024) not null
);

To fill the table, I use this simple function:

create or replace function iotest.writestress(megs integer) returns  
void as $$
declare
    char_str char(1024) := repeat('x', 1024);
begin
     for i in 1..megs loop
          for j in 1..1024 loop
              insert into one_kb_rows(the_col) values (char_str);
          end loop;
     end loop;
end;
$$
language plpgsql;

Then, I tested how long it takes to write 10Gb of data to this table:

iotest=> \timing
Timing is on.

iotest=> select writestress((10*1024));
 writestress
-------------

(1 row)

Time: 379971.252 ms

This gives that 10Gb takes ~380s => ~27Mb/s (with fsync=off),  
compared to the raw dd result (~75.5Mb/s).

I assume this difference is due to:
- simultaneous WAL write activity (assumed: for each byte written  
to the table, at least one byte is also written to WAL, in effect:  
10Gb data inserted in the table equals 20Gb written to disk)
- lousy test method (it is done using a function => the transaction  
size is 10Gb, and 10Gb will *not* fit in wal_buffers :) )
- poor config
- something else?

I have tried to read up as much as possible on Postgres  
configuration (disk layout, buffer management, WAL sync methods,  
etc) and found this post regarding bgwriter tweaking: http:// 
archives.postgresql.org/pgsql-performance/2006-03/msg00218.php -  
which explains the bgwriter config below.

All params in postgresql.conf that are not commented out:
---------------------------------------------------------
max_connections = 100
superuser_reserved_connections = 2
shared_buffers = 16000
bgwriter_lru_percent = 20
bgwriter_lru_maxpages = 160
bgwriter_all_percent = 10
bgwriter_all_maxpages = 320
fsync = off
wal_sync_method = open_sync
wal_buffers = 128
checkpoint_segments = 3
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1440
log_line_prefix = '%m: (%u@%d) '
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'

fsync can safely be kept off during data migration as we are able  
to restart the procedure without losing data if something goes  
wrong. Increasing chekpoint_segments to 8/16/32 only increased the  
insert time, so I kept it at the default. I will increase  
shared_buffers and effective_cache_size as soon as it's time to  
tweak read performance, but for now I'm just focusing on write  
performance.

Postgres version used:

iotest=> select version();
                                              version
---------------------------------------------------------------------- 
-----------------------------
 PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)  
3.4.4 20050721 (Red Hat 3.4.4-2)
(1 row)

I want to make sure I have made correct assumptions before I carry  
on, so comments are welcome.

- Mikael

---------------------------(end of  
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster