Re: What's the typical RAID10 setup?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 2 Feb 2011 23:57:17 -0200



i have updated again, some questions are being explained
(https://bbs.archlinux.org/viewtopic.php?pid=887345)
check that this question (optional io mirror scheduler algorithm) is
very old (1+1/2 years, Chris Worley [ Fr, 16 Oktober 2009 21:07 ] [ ID
#2019215 ])

http://www.issociate.de/board/post/499463/Load-balancing_mirrors_w/_asymmetric_performance.html


2011/2/2 Roberto Spadim <roberto@xxxxxxxxxxxxx>:
> nice, i don´t know if it´s a problem of single thread
> i think it´s a problem about async read command being executed in parallel
> i post again at https://bbs.archlinux.org/viewtopic.php?pid=887345
> please see the history at the end of page
> i´m talking about a disk with 5000rpm and a disk with 7000rpm
> i think we can optimize mirror read algorithm and it´s not very hard
> for same speed hard disk, near mirror is good
> for same speed solid state, round robin is good
> for anyone, time based is good
>
> diferences?
> hard disk: time to position head is high, time to read can be small
> solid state: time to position is small, time to read is small (some
> ssd are old, and have small read rate)
> nbd: time based on server hard/solid disk, and network time, but don´t
> think in nbd yet
>
> 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>:
>> Hmm, Roberto, I think we are close to theoretical maximum with
>> some of the raid1/raid10 stuff already. and my nose tells me
>> that we can gain more by minimizing CPU usage.
>> Or maybe using some threading for raid modules - they
>> all run single-threaded.
>>
>> Best regards
>> keld
>>
>>
>> On Wed, Feb 02, 2011 at 06:28:27PM -0200, Roberto Spadim wrote:
>>> before, this thread i put at this page:
>>> https://bbs.archlinux.org/viewtopic.php?pid=887267
>>> to make this mail list with less emails
>>>
>>> 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>:
>>> > Hmm, Roberto, where are the gains?
>>>
>>> it?s dificult to talk... NCQ and linux scheduler don?t help a mirror,
>>> they help a single device
>>> a new scheduler for mirrors can be done (round robin, closest head, others)
>>>
>>> > I think it is hard to make raid1 better than it is today.
>>> i don?t think, since head, is just for hard disk (rotational) not for
>>> solid state disks, let?s not talk about ssd, just hard disk? a raid
>>> with 5000rpm  and 10000rpm disk, we will have better i/o read with
>>> 10000rpm ? we don?t know the model of i/o for that device, but
>>> probally will be faster, but when it?s busy we could use 5000rpm...
>>> that?s the point, just closest head don?t help, we need know what?s
>>> the queue (list of i/o being processed) and the time to read the
>>> current i/o
>>>
>>> > Normally the driver orders the reads to minimize head movement
>>> > and loss with rotation latency. Where can  we improve that?
>>>
>>> no way to improve it, it?s very good! but per hard disk, not per mirror
>>> but since we know it?s busy we can use another mirror (another disk
>>> with same information), that?s what i want
>>>
>>> > Also, what about conflicts with the elevator algorithm?
>>> elevator are based on model of disk, think disk as: linux elevator +
>>> NCQ + disks, the sum of three infomration give us time based
>>> infomrations to select best device
>>> maybe making complex code (per elevator) we could know the time spent
>>> to execute it, but it?s a lot of work,
>>> for the first model, lets think about parameters of our model (linux
>>> elevator + ncq + disks)
>>> a second version we could implement elevator algorithm time
>>> calculation (network block device NBD, have a elevator? at server side
>>> + tcp/ip stack at client and server side, right?)
>>>
>>> > There are several scheduling algorithms available, and each has
>>> > its merits. Will your new scheme work against these?
>>> > Or is your new scheme just another scheduling algorithm?
>>>
>>> it?s a scheduling for mirrors
>>> round balance is a algorithm for mirror
>>> closest head is a algorithm for mirror
>>> my 'new' algorith will be for mirror (if anyone help me coding for
>>> linux kernel hehehe, i didn?t coded for linux kernel yet, just for
>>> user space)
>>>
>>> noop, deadline, cfq isn?t for mirror, these are for raid0 problem
>>> (linear, stripe if you hard disk have more then one head on your hard
>>> disk)
>>>
>>> > I think I learned that scheduling is per drive, not per file system.
>>> yes, you learned right! =)
>>> /dev/md0 (raid1) is a device with scheduling (closest head,round robin)
>>> /dev/sda is a device with scheduling (noop, deadline, cfq, others)
>>> /dev/sda1 is a device with scheduling (it send all i/o directly to /dev/sda)
>>>
>>> the new algorithm is just for mirrors (raid1), i dont remeber about
>>> raid5,6 if they are mirror based too, if yes they could be optimized
>>> with this algorithm too
>>>
>>> raid0 don?t have mirrors, but information is per device striped (not
>>> for linear), that?s why it can be faster... can make parallel reads
>>>
>>> with closest head we can?t use best disk, we can use a single disk all
>>> time if it?s head closer, maybe it?s not the fastest disk (that?s why
>>> we implent the write-mostly, we don?t make they usable for read, just
>>> for write or when mirror fail, but it?s not perfect for speed, a
>>> better algorithm can be made, for identical disks, a round robin work
>>> well, better than closest head if it?s a solid state disk)
>>> ok on a high load, maybe closest mirror is better than this algorithm?
>>> yes, if you just use hard disk, if you mix hard disk+solid
>>> state+network block device +floppy disks+any other device, you don?t
>>> have the best algorithm for i/o over mirrors
>>>
>>>
>>> > and is it reading or writing or both? Normally we are dependant on the
>>> > reading, as we cannot process data before we have read them.
>>> > OTOH writing is less time critical, as nobody is waiting for it.
>>> it must be implemented on write and read, write for just time
>>> calculations, read for select the best mirror
>>> for write we must write on all mirrors (sync write is better, async
>>> isn?t power fail safe)
>>>
>>> > Or is it maximum thruput you want?
>>> > Or a mix, given some restraints?
>>> it?s the maximum performace = what?s the better strategy to spent less
>>> time to execute current i/o, based on time to access disk, time to
>>> read bytes, time to wait others i/o being executed
>>>
>>> that?s for mirror select, not for disks i/o
>>> for disks we can use noop, deadline, cfq scheduller (for disks)
>>> tcp/ip tweaks for network block device
>>>
>>> a model identification must execute to tell the mirror select
>>> algorithm what?s the model of each device
>>> model: time to read X bytes, time to move head, time to start a read,
>>> time to write, time time time per byte per kb per units
>>> calcule time and select the minimal value calculated as the device
>>> (mirror) to execute our read
>>>
>>>
>>> >
>>> > best regards
>>> > keld
>>>
>>> thanks keld
>>>
>>> sorry if i make email list very big
>>>
>>>
>>>
>>> --
>>> Roberto Spadim
>>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html