Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 12 Sep 2011, Aidan Van Dyk wrote:

On Mon, Sep 12, 2011 at 6:57 PM,  <david@xxxxxxx> wrote:

The "barrier" is the linux fs/block way of saying "these writes need
to be on persistent media before I can depend on them".  On typical
spinning media disks, that means out of the disk cache (which is not
persistent) and on platters.  The way it assures that the writes are
on "persistant media" is with a "flush cache" type of command.  The
"flush cache" is a close approximation to "make sure it's persistent".

If your cache is battery backed, it is now persistent, and there is no
need to "flush cache", hence the nobarrier option if you believe your
cache is persistent.

Now, make sure that even though your raid cache is persistent, your
disks have cache in write-through mode, cause it would suck for your
raid cache to "work", but believe the data is safely on disk and only
find out that it was in the disks (small) cache, and you're raid is
out of sync after an outage because of that...  I believe most raid
cards will handle that correctly for you automatically.

if you don't have barriers enabled, the data may not get written out of main
memory to the battery backed memory on the card as the OS has no reason to
do the write out of the OS buffers now rather than later.

It's not quite so simple.  The "sync" calls (pick your flavour) is
what tells the OS buffers they have to go out.  The syscall (on a
working FS) won't return until the write and data has reached the
"device" safely, and is considered persistent.

But in linux, a barrier is actually a "synchronization" point, not
just a "flush cache"...  It's a "guarantee everything up to now is
persistent, I'm going to start counting on it".  But depending on your
card, drivers and yes, kernel version, that "barrier" is sometimes a
"drain/block I/O queue, issue cache flush, wait, write specific data,
flush, wait, open I/O queue".  The double flush is because it needs to
guarantee everything previous is good before it writes the "critical"
piece, and then needs to guarantee that too.

Now, on good raid hardware it's not usually that bad.

And then, just to confuse people more, LVM up until 2.6.29 (so that
includes all those RHEL5/CentOS5 installs out there which default to
using LVM) didn't handle barriers, it just sort of threw them out as
it came across them, meaning that you got the performance of
nobarrier, even if you thought you were using barriers on poor raid
hardware.

this is part of the problem.

if you have a simple fs-on-hardware you may be able to get away with the barriers, but if you have fs-on-x-on-y-on-hardware type of thing (specifically where LVM is one of the things in the middle), and those things in the middle do not honor barriers, the fsync becomes meaningless because without propogating the barrier down the stack, the writes that the fsync triggers may not get to the disk.

Every raid card I have seen has ignored the 'flush cache' type of command if
it has a battery and that battery is good, so you leave the barriers enabled
and the card still gives you great performance.

XFS FAQ  goes over much of it, starting at Q24:
  http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

So, for pure performance, on a battery-backed controller, nobarrier is
the recommended *performance* setting.

But, to throw a wrench into the plan, what happens when during normal
battery tests, your raid controller decides the battery is failing...
of course, it's going to start screaming and send all your monitoring
alarms off (you're monitoring that, right?), but have you thought to
make sure that your FS is remounted with barriers at the first sign of
battery trouble?

yep.

on a good raid card with battery backed cache, the performance difference between barriers being on and barriers being off should be minimal. If it's not, I think that you have something else going on.

David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux