Re: Using IOZone to simulate DB access patterns

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Sat, 11 Apr 2009 19:00:07 -0700

On 4/11/09 11:44 AM, "Mark Wong" <markwkm@xxxxxxxxx> wrote:

> On Fri, Apr 10, 2009 at 11:01 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
>> On Fri, 10 Apr 2009, Scott Carey wrote:
>> 
>>> FIO with profiles such as the below samples are easy to set up
>> 
>> There are some more sample FIO profiles with results from various
>> filesystems at
>> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
> 
> There's a couple of potential flaws I'm trying to characterize this
> weekend.  I'm having second thoughts about how I did the sequential
> read and write profiles.  Using multiple processes doesn't let it
> really do sequential i/o.  I've done one comparison so far resulting
> in about 50% more throughput using just one process to do sequential
> writes.  I just want to make sure there shouldn't be any concern for
> being processor bound on one core.

FWIW, my raid array will do 1200MB/sec, and no tool I've used can saturate
it without at least two processes.  'dd' and fio can get close (1050MB/sec),
if the block size is <= ~32k <=64k.  With a postgres sized 8k block 'dd'
can't top 900MB/sec or so. FIO can saturate it only with two+ readers.

I optimized my configuration for 4 concurrent sequential readers with 4
concurrent random readers, and this helped the overall real world
performance a lot.  I would argue that on any system with concurrent
queries, concurrency of all types is important to measure.  Postgres isn't
going to hold up one sequential scan to wait for another.  Postgres on a
3.16Ghz CPU is CPU bound on a sequential scan at between 250MB/sec and
800MB/sec on the type of tables/queries I have.  Concurrent sequential
performance was affected by:
Xfs -- the gain over ext3 was large
Readahead tuning -- about 2MB per spindle was optimal (20MB for me, sw raid
0 on 2x[10 drive hw raid 10]).
Deadline scheduler (big difference with concurrent sequential + random
mixed).

One reason your tests write so much faster than they read was the linux
readahead value not being tuned as you later observed.  This helps ext3 a
lot, and xfs enough so that fio single threaded was faster than 'dd' to the
raw device.

> 
> The other flaw is having a minimum run time.  The max of 1 hour seems
> to be good to establishing steady system utilization, but letting some
> tests finish in less than 15 minutes doesn't provide "good" data.
> "Good" meaning looking at the time series of data and feeling
> confident it's a reliable result.  I think I'm describing that
> correctly...

It really depends on the specific test though.  You can usually get random
iops numbers that are realistic in a fairly short time, and 1 minute long
tests for me vary by about 3% (which can be +-35MB/sec in my case).

I ran my tests on a partition that was only 20% the size of the whole
volume, and at the front of it.  Sequential transfer varies by a factor of 2
across a SATA disk from start to end, so if you want to compare file systems
fairly on sequential transfer rate you have to limit the partition to an
area with relatively constant STR or else one file system might win just
because it placed your file earlier on the drive.

> 
> Regards,
> Mark
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance