On Sat, Aug 01, 2009 at 08:13:45AM -0700, David Rees wrote: > 2009/8/1 Keld Jørn Simonsen <keld@xxxxxxxx>: > > On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote: > >> Let's use some data from a real disk, the Velociraptor and a 2-disk > >> array and streaming reads/writes. At the beginning of the disk you > >> can read about 120MB/s. At the end of the disk, you can read about > >> 80MB/s. > > > > This is not actual figures from some benchmarking you did, true? > > Those are actual numbers from a Velociraptor, but the numbers are just > estimates. > > >> Data on the "beginning" of array, RAID0 = 240MB/s > >> Data on the "end" of array, RAID0 = 160MB/s. > >> Data on the "beginning" of array, RAID10,n2 = 120MB/s > >> Data on the "end" of array, RAID10,n2 = 80MB/s. > >> Data on the "beginning" of array, RAID10,f2 = 200MB/s > > > > Should be: > > > > Data on the "beginning" of array, RAID10,f2 = 230MB/s > > No - you're getting 120 MB/s from one disk and 80MB/s from another. > How that would add up to 230MB/s defies logic... Why only 80 MB/ when reading? reading from both disks with raid10,f2 are done at the beginning of both disks, thus getting about 115 MB/s from both of them. > >> With a f2 setup you'll read at something less than 120+80 = 200MB/s. > > > > When? at the beginning or the end? > > The whole thing, on average. But the whole point of f2 is to even out > performance from beginning of the array and let you stripe reads. > > > Random read performance on a single disk, in one test, was 34 MB/s while > > seq read on same disk was 82. In raid10,f2 with 2 disks random read was > > 79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is > > a likely result as you should expect a doubling in speed from the 2 > > disks, and then som additional speed from the faster sectors of the > > outer disks, and then the shorter access times on the oyuter disk > > sectors. Geometry says that on average the transfer speeds are 17 % > > shorter on the outer half part of the disk, compared to the whole disk. > > So that gives some 235 % speed improvement (2 * 1.17). The head > > improvement should also give a little, but maybe the elevator algorithm > > of the file system eliminates most of that factor. > > Sorry - I'm having a hard time wrapping my head around that you can > simply ignore access to the slow half the disk in a multi-threaded > random IO test. reading in raid10,f2 is restricted to the faster half of the disk, by design. It is different when writing. there both halves, fast and slow, are used. > The only way I might believe that you can get 235% > improvement is in a single threaded test with a queue depth of 1 which > lets the f2 setup only use the fast half the disks. The test was for a multi-threaded test, with many processes running, say about 200 processes. The test was set up to mimick a ftp mirror. > If that is your > assumption, then, OK. But then getting 34MB/s isn't out of a rotating > disk isn't random IO, either. Random IO on a rotating disk is > normally an order of magnitude slower. Agreed. The 34 MB/s is random io in a multi-thread environment. and an elevator algorithm is in operation. If you only do the individual random reading in a single thread, it would be much slower. However, the same speedups will occur for raid10,f2. There will be a double up from reading from 2 disks at the same time, and only using the faster half of the disks will both make a better overall transfer rate, and quicker access times. > >> >> 1. The array mostly sees write activity, streaming reads aren't that common. > >> >> 2. I can only get about 120 MB/s out of the external enclosure because > >> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get > >> >> any extra performance out of those disks. > >> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124 > >> > > >> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s. > >> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s. > >> > That is the speed of a PCI 32 bit bus. I looked at your reference [1] > >> > for the 3132 model. Have you tried it out in practice? > >> > >> Yes, in practice, IO reached exactly 120MB/s out of the controller. I > >> ran dd read/write tests on individual disks and found that overall > >> throughput peaked exactly at 120MB/s. > > > > Hmm, get another controller, then. A cheap PCIe contoller should be able > > to do about 300 MB/s on a x1 PCIe. > > Please read my reference again. It's a motherboard limitation. I > already _have_ a good, cheap PCIe controller. OK, I read: [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124 as being the description of the PCIe controller, especially SIL 3132 - the PCIe controller. And that this was restricted to 120 MB/s - not the mobo. Anuway, yuo could get a new mobo, they are cheap these days and many of them come with either 4 or 8 SATA interfaces. If you have bought Velociraptors then it must be for the speed, and quite cheap mobos could enhance your performance considerably. > >> But the primary reason I built it was to handle lots of small random > >> writes/reads, so being limited to 120MB/s out of the enclosure isn't > >> noticeable most of the time in practice as you say. > > > > Yes, for random read/write you only get something like 45 % out of the > > max transfer bandwidth. So 120 MB/s would be close to the max that your > > 5 disks on the PCIe controller can deliver. With a faster PCIe > > controller you should be able to get better performance on random reads > > with raid10,f2. Anyway 180 MB/s may be fast enough for your application. > > Again - your idea of "random" IO is completely different than mine. > My random IO workloads can only get a couple MB/s out of a single > disk. yes, it seems we have different usage scenarios. I am serving reasonably big files, say 700 MB ISO images, or .rpm packages of several MBs, you are probably doing some database access. > Here's a benchmark which tests SSDs and rotational disks. All the > rotational disks are getting less than 1MB/s in the random IO test. > http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=25 It's a > worst case scenario, but not far from my workloads which obviously > read a bit more data on each read. What are your average read or write block sizes? Is it some database usage? best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html