Re: best base / worst case RAID 5,6 write speeds

Dallas Clement <dallas.a.clement@xxxxxxxxx> · Fri, 11 Dec 2015 17:30:26 -0600

On Fri, Dec 11, 2015 at 3:24 PM, Dallas Clement
<dallas.a.clement@xxxxxxxxx> wrote:
> On Fri, Dec 11, 2015 at 1:34 PM, John Stoffel <john@xxxxxxxxxxx> wrote:
>>>>>>> "Dallas" == Dallas Clement <dallas.a.clement@xxxxxxxxx> writes:
>>
>> Dallas> On Fri, Dec 11, 2015 at 10:32 AM, John Stoffel <john@xxxxxxxxxxx> wrote:
>>>>>>>>> "Dallas" == Dallas Clement <dallas.a.clement@xxxxxxxxx> writes:
>>>>
>> Dallas> Hi Mark.  I have three different controllers on this
>> Dallas> motherboard.  A Marvell 9485 controls 8 of the disks.  And an
>> Dallas> Intel Cougar Point controls the 4 remaining disks.
>>>>
>>>> What type of PCIe slots are the controllers in?  And how fast are the
>>>> controllers/drives?  Are they SATA1/2/3 drives?
>>>>
>>>>>> If you're spinning in IO loops then it could be a driver issue.
>>>>
>> Dallas> It sure is looking like that.  I will try to profile the
>> Dallas> kernel threads today and maybe use blktrace as Phil
>> Dallas> recommended to see what is going on there.
>>>>
>>>> what kernel aer you running?
>>>>
>> Dallas> This is pretty sad that 12 single threaded fio jobs can bring
>> Dallas> this system to its knees.
>>>>
>>>> I think it might be better to lower the queue depth, you might be just
>>>> blowing out the controller caches...  hard to know.
>>
>> Dallas> Hi John.
>>
>>>> What type of PCIe slots are the controllers in?  And how fast are the
>>>> controllers/drives?  Are they SATA1/2/3 drives?
>>
>> Dallas> The MV 9485 controller is attached to an Intel Sandy Bridge
>> Dallas> via PCIe GEN2 x 8.  This one controls 8 of the disks.  The
>> Dallas> Intel Cougar Point is connected to the Intel Sandy Bridge via
>> Dallas> DMI bus.
>>
>> So that should all be nice and fast.
>>
>> Dallas> All of the drives are SATA III, however I do have two of the
>> Dallas> drives connected to SATA II ports on the Cougar Point.  These
>> Dallas> two drives used to be connected to SATA III ports on a MV
>> Dallas> 9125/9120 controller.  But it had truly horrible write
>> Dallas> performance.  Moving to the SATA II ports on the Cougar Point
>> Dallas> boosted the performance close to the same as the other drives.
>> Dallas> The remaining 10 drives are all connected to SATA III ports.
>>
>>>> what kernel aer you running?
>>
>> Dallas> Right now, I'm using 3.10.69.  But I have tried the 4.2 kernel
>> Dallas> in Fedora 23 with similar results.
>>
>> Hmm... maybe if your feeling adventerous you could try v4.4-rc4 and
>> see how it works.  You don't want anything between 4.2.6 and that
>> because of problems with blk req management.  I'm hazy on the details.
>>
>>>> I think it might be better to lower the queue depth, you might be just
>>>> blowing out the controller caches...  hard to know.
>>
>> Dallas> Good idea.  I'll trying lowering to see what effect.
>>
>> It might also make sense to try your tests starting with just 1 disk,
>> and then adding one more disk, re-running the tests, then another
>> disk, re-running the tests, etc.
>>
>> Try with one on the MV, then one on the Cougar, then one on MV and one
>> on Cougar, etc.
>>
>> Try to see if you can spot where the performance falls off the cliff.
>>
>> Also, which disk scheduler are you using?  Instead of CFQ, you might
>> try deadline instead.
>>
>> As you can see, there's a TON of knobs to twiddle with, it's not a
>> simple thing to do at times.
>>
>> John
>
>> It might also make sense to try your tests starting with just 1 disk,
>> and then adding one more disk, re-running the tests, then another
>> disk, re-running the tests, etc
>
>> Try to see if you can spot where the performance falls off the cliff.
>
> Okay, did this.  Interestingly, things did not fall of the cliff until
> adding in the 12th disk.  I started adding disks one at a time
> beginning with the Cougar Point.  The %iowait jumped up right away
> with this guy also.
>
>> Also, which disk scheduler are you using?  Instead of CFQ, you might
>> try deadline instead.
>
> I'm using deadline.  I have definitely observed better performance
> with this vs cfq.
>
> At this point I think I need to probably use a tool like blktrace to
> get more visibility than what I have with ps and iostat.

I have one more observation.  I tried varying the queue depth from 1,
4, 16, 32, 64, 128, 256.  Surprisingly, all 12 disks are able to
handle this load with queue depth <= 128.  Each disk is at 100%
utilization and writing 170-180 MB/s.  Things start to fall apart with
queue depth = 256 after adding in the 12th disk.  The inflection point
on load average seems to be around queue depth = 32.  The load average
for this 8 core system goes up to about 13 when I increase the queue
depth to 64.

So is my workload of 12 fio jobs writing sequential 2 MB blocks with
direct I/O just too abusive?  Seems so with high queue depth.

I started this discussion because my RAID 5 and RAID 6 write
performance is really bad.  If my system is able to write to all 12
disks at 170 MB/s in JBOD mode, I am expecting that one fio job should
be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870
MB/s.  However, I am getting < 700 MB/s for queue depth = 32 and < 600
MB/s for queue depth = 256.  I get similarly disappointing results for
RAID 6 writes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html