Re: ext4, barrier, md/RAID1 and write cache

Daniel Pocock <daniel@xxxxxxxxxxxxx> · Mon, 07 May 2012 22:56:29 +0200

On 07/05/12 20:59, Martin Steigerwald wrote:
> Am Montag, 7. Mai 2012 schrieb Daniel Pocock:
>   
>>> Possibly the older disk is lying about doing cache flushes.  The
>>> wonderful disk manufacturers do that with commodity drives to make
>>> their benchmark numbers look better.  If you run some random IOPS
>>> test against this disk, and it has performance much over 100 IOPS
>>> then it is definitely not doing real cache flushes.
>>>       
> […]
>   
> I think an IOPS benchmark would be better. I.e. something like:
>
> /usr/share/doc/fio/examples/ssd-test
>
> (from flexible I/O tester debian package, also included in upstream tarball 
> of course)
>
> adapted to your needs.
>
> Maybe with different iodepth or numjobs (to simulate several threads 
> generating higher iodepths). With iodepth=1 I have seen 54 IOPS on a 
> Hitachi 5400 rpm harddisk connected via eSATA.
>
> Important is direct=1 to bypass the pagecache.
>
>   
Thanks for suggesting this tool, I've run it against the USB disk and an
LV on my AHCI/SATA/md array

Incidentally, I upgraded the Seagate firmware (model 7200.12 from CC34
to CC49) and one of the disks went offline shortly after I brought the
system back up.  To avoid the risk that a bad drive might interfere with
the SATA performance, I completely removed it before running any tests. 
Tomorrow I'm out to buy some enterprise grade drives, I'm thinking about
Seagate Constellation SATA or even SAS.

Anyway, onto the test results:

USB disk (Seagate  9SD2A3-500 320GB):

rand-write: (groupid=3, jobs=1): err= 0: pid=22519
  write: io=46680KB, bw=796512B/s, iops=194, runt= 60012msec
    slat (usec): min=13, max=25264, avg=106.02, stdev=525.18
    clat (usec): min=993, max=103568, avg=20444.19, stdev=11622.11
    bw (KB/s) : min=  521, max= 1224, per=100.06%, avg=777.48, stdev=97.07
  cpu          : usr=0.73%, sys=2.33%, ctx=12024, majf=0, minf=20
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued r/w: total=0/11670, short=0/0
     lat (usec): 1000=0.01%
     lat (msec): 2=0.01%, 4=0.24%, 10=2.75%, 20=64.64%, 50=29.97%
     lat (msec): 100=2.31%, 250=0.08%

and from the SATA disk on the AHCI controller
- Barracuda 7200.12  ST31000528AS connected to
- AMD RS785E/SB820M chipset, (lspci reports SB700/SB800 AHCI mode)

rand-write: (groupid=3, jobs=1): err= 0: pid=23038
  write: io=46512KB, bw=793566B/s, iops=193, runt= 60018msec
    slat (usec): min=13, max=35317, avg=97.09, stdev=541.14
    clat (msec): min=2, max=214, avg=20.53, stdev=18.56
    bw (KB/s) : min=    0, max=  882, per=98.54%, avg=762.72, stdev=114.51
  cpu          : usr=0.85%, sys=2.27%, ctx=11972, majf=0, minf=21
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued r/w: total=0/11628, short=0/0

     lat (msec): 4=1.81%, 10=32.65%, 20=31.30%, 50=26.82%, 100=6.71%
     lat (msec): 250=0.71%

The IOPS scores look similar, but I checked carefully and I'm fairly
certain the disks were mounted correctly when the tests ran.

Should I run this tool over NFS, will the results be meaningful?

Given the need to replace a drive anyway, I'm really thinking about one
of the following approaches:
- same controller, upgrade to enterprise SATA drives
- buy a dedicated SAS/SATA controller, upgrade to enterprise SATA drives
- buy a dedicated SAS/SATA controller, upgrade to SAS drives

My HP N36L is quite small, one PCIe x16 slot, the internal drive cage
has an SFF-8087 (mini SAS) plug, so I'm thinking I can grab something
small like the Adaptec 1405 - will any of these solutions offer a
definite win with my NFS issues though?

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html