Using pgiosim realistically

John Rouillard <rouilj@xxxxxxxxxxx> · Fri, 13 May 2011 21:09:41 +0000

Hi all:

I am adding pgiosim to our testing for new database hardware and I am
seeing something I don't quite get and I think it's because I am using
pgiosim incorrectly.

Specs:

  OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
  memory: 96GB
  cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
  disks: WD2003FYYS RE4
  raid: lsi - 9260-4i with 8 disks in raid 10 configuration
              1MB stripe size
              raid cache enabled w/ bbu
              disk caches disabled
  filesystem: ext3 created with -E stride=256

I am seeing really poor (70) iops with pgiosim.  According to:
http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
database benchmark they are seeing ~170 iops on a single disk for
these drives. I would expect an 8 disk raid 10 should get better then
3x the single disk rate (assuming the data is randomly distributed).

To test I am using 5 100GB files with

    sudo ~/pgiosim -c -b 100G -v file?

I am using 100G sizes to make sure that the data read and files sizes
exceed the memory size of the system.

However if I use 5 1GB files (and still 100GB read data) I see 200+ to
400+ iops at 50% of the 100GB of data read, which I assume means that
the data is cached in the OS cache and I am not really getting hard
drive/raid I/O measurement of iops.

However, IIUC postgres will never have an index file greater than 1GB
in size
(http://www.postgresql.org/docs/8.4/static/storage-file-layout.html)
and will just add 1GB segments, so the 1GB size files seems to be more
realistic.

So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to
feed pgiosim? That way I will have enough data that not all of it can
be cached in memory and the file sizes (and file operations:
open/close) more closely match what postgres is doing with index
files?

Also in the output of pgiosim I see:

  25.17%,   2881 read,      0 written, 2304.56kB/sec  288.07 iops

which I interpret (left to right) as the % of the 100GB that has been
read, the number of read operations over some time period, number of
bytes read/written and the io operations/sec. Iops always seems to be
1/10th of the read number (rounded up to an integer). Is this
expected and if so anybody know why?

While this is running if I also run "iostat -p /dev/sdc 5" I see:

  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc             166.40      2652.80         4.80      13264         24
  sdc1           2818.80         1.20       999.20          6       4996

which I am interpreting as 2818 read/io operations (corresponding more
or less to read in the pgiosim output) to the partition and of those
only 116 are actually going to the drive??? with the rest handled from
OS cache.

However the tps isn't increasing when I see pgiosim reporting:

   48.47%,   4610 read,      0 written, 3687.62kB/sec  460.95 iops

an iostat 5 output near the same time is reporting:

  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc             165.87      2647.50         4.79      13264         24
  sdc1           2812.97         0.60       995.41          3       4987

so I am not sure if there is a correlation between the read and tps
settings.

Also I am assuming blks written is filesystem metadata although that
seems like a lot of data 

If I stop the pgiosim, the iostat drops to 0 write and reads as
expected.

So does anybody have any comments on how to test with pgiosim and how
to correlate the iostat and pgiosim outputs?

Thanks for your feedback.
-- 
				-- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance