Re: Software RAID5 write issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/06/2009, at 6:57 AM, Doug Ledford wrote:

On Thu, 2009-06-11 at 03:17 +1000, Steven Haigh wrote:
Hi all,

After a week and a bit of googling, experimenting and frustration I'm
posting here and hoping I can get some clues on what could be wrong
with my 5 disk RAID5 SATA array.

The array in question is:
md1 : active raid5 sdg1[0] sdf1[1] sde1[3] sdd1[2] sdc1[4]
1172131840 blocks level 5, 1024k chunk, algorithm 2 [5/5] [UUUUU]

All 5 drives are connection to a sil_sata controller (a 3112 & a 3114)
set up as a simple SATA controller (ie no RAID here).

Once the system buffer is full, write speeds to the array are usually
under 20MB/sec.

I am currently running CentOS 5.3 (kernel
2.6.18-128.1.10.el5.centos.plus).

I have lodged a bug report against RHEL 5.3, as I believe something is
not quite right here, but haven't been able to narrow down the exact
issue.
	https://bugzilla.redhat.com/show_bug.cgi?id=502499

Using bonnie++ to benchmark the array, it shows sequential block reads
at 90MB/sec but writes at 11MB/sec across the RAID5 array - a
difference I really didn't expect.

Any pointers on how to try to tackle this one and figure out the root
cause of the problem would be VERY helpful!

OK, so I read the bug report.  There are two distinctly different
problems you are experiencing.  One, is a slow down specific to our
recent kernels. The slow down in your case takes your normally abysmal raid and makes it even worse. The original bug report was mainly about
the slowdown, so I'll address that in the bug report.  However, in
regards to your raid setup, I'll try to address why your array performs
so poorly regardless of kernel version and maybe that will help you
build up a better raid setup.

You have 4 motherboard SATA ports, and 4 SATA ports on a PCI card.
Right now you have your two OS drives on motherboard SATA ports, two of the five raid5 drives on motherboard SATA ports, and the three remaining raid5 drives on the PCI card SATA ports. You need to get as many of the
raid5 SATA disks on motherboard ports as possible.  I would decide if
you are more concerned about the raid5 array performing well (common, as
it's usually the data you access most often) or the base OS array
performing well (not so common, as it gets loaded largely into cache and
doesn't get hit nearly so often as the data drive).  If you can deal
with slowing down the OS drives, then I would move one of the OS drives to the PCI card and move one of the raid5 drives to the motherboard SATA
port (and whichever drive you just moved over to the PCI card, I would
mark it's raid1 arrays as write-mostly so that you don't read from it
normally). If your BIOS will allow you to select drives on the PCI card
as boot drives, and you can tolerate the slow down, then I would move
both of the OS drives to the PCI card (and don't worry about using
write-mostly on the raid1 arrays any more) and get 4 of the 5 raid5
drives onto motherboard SATA ports.


This would also be an interesting test - as from memory, now that I have updated the firmware on the 3114 card, the BIOS will see those drives and allow me to boot from them (hopefully). I will experiment here and post the results.

Your big problem is that with 3 out of 5 raid5 drives on that PCI card,
and sharing bandwidth, your total theoretical raid speed is abysmal.
When the three drives are sharing bandwidth on the card, they tend to
split it up fairly evenly.  That means each drive gets roughly 1/3 of
the PCI card's total available bandwidth over the PCI bus, which is
generally poor in the first place.  Understand that a slow drive drags
down *all* the drives in a raid5 array.  The faster drives just end up
idling while waiting on the slower drive to finish its work (the faster
drives will run ahead up to a point, then they eventually just get so
far ahead that there isn't anything else for them to do until the
slowest drive finishes up its stuff so old block requests can be
completed, etc). On the other hand, if you get 4 of the 5 drives on the
motherboard ports, then that 5th drive on the PCI card won't be
splitting bandwidth up and the overall array performance will shoot up
(assuming the OS drives aren't also heavily loaded).

Isn't the PCI Bus limited to around 133MB/sec? If so, even with 3 drives on the same controller, you would expect divided equally that each drive would get ~44MB/sec before overheads - not around 7MB/sec per drive. I know I'm not going to get phenomenal performance with my setup, but as most the data is archiving (and then copied to tape), I would like to get things at least up to a reasonable level instead of having a write speed of ~12% of the read speed.

If you move one OS drive to the PCI card, then that leaves two raid5
drives on the card.  In that case, I would seriously consider dropping
back to a 4 drive array if you can handle the space reduction. I would
also seriously consider using raid4 instead of raid5 depending on your
normal usage pattern.  If the data on the raid5 array is written once
and then read over and over again, a raid4 can be beneficial in that you can stick the parity drive off on the PCI card and it won't be read from unless there is a drive failure or one the rare occasions when you write
new data.  If, on the other hand, you write lots of new data, then
either don't use raid4, or put the parity drive on a motherboard port
where it won't hog so much bandwidth on the PCI card. Ideally, I would say get both OS drives on the PCI card, and if you need all 5 drives for
the data raid, then use raid4 with the parity on the PCI card if the
array is mostly static, use raid5 otherwise.  If you only move one OS
drive to the PCI card and still have two raid5 drives on the PCI card,
then again think about whether your data is static or not and possibly
use raid4 in an attempt to reduce the traffic on the PCI card.


Hmmm - a very interesting read - but I am a little confused when it comes to PCI bandwidth. I would assume (maybe wrongly) that if I can READ from the array at 95MB/sec (as measured by bonnie++), then I should be able to write to the same array at a little faster than 11MB/ sec - as a read would usually read from 4 of 5 drives, however a write would go to all drives. This being said, I wouldn't expect one extra write to equal 12% of a read speed!

The other thing I wonder is if it has something to do with the sil_sata driver - as ALL the drives in the RAID5 are handled by that kernel module. The boot RAID1 is on the ICH5 SATA controller - and suffers no performance issues at all. It shows a good 40MB/sec+ read AND write speeds per drive.

--
Steven Haigh

Email: netwiz@xxxxxxxxx
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux