Re: What's the typical RAID10 setup?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 2 Feb 2011 19:31:22 -0200



sorry pour english, it´s not closest head, it´s nearest head

2011/2/2 Roberto Spadim <roberto@xxxxxxxxxxxxx>:
> before, this thread i put at this page:
> https://bbs.archlinux.org/viewtopic.php?pid=887267
> to make this mail list with less emails
>
> 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>:
>> Hmm, Roberto, where are the gains?
>
> it´s dificult to talk... NCQ and linux scheduler don´t help a mirror,
> they help a single device
> a new scheduler for mirrors can be done (round robin, closest head, others)
>
>> I think it is hard to make raid1 better than it is today.
> i don´t think, since head, is just for hard disk (rotational) not for
> solid state disks, let´s not talk about ssd, just hard disk? a raid
> with 5000rpm  and 10000rpm disk, we will have better i/o read with
> 10000rpm ? we don´t know the model of i/o for that device, but
> probally will be faster, but when it´s busy we could use 5000rpm...
> that´s the point, just closest head don´t help, we need know what´s
> the queue (list of i/o being processed) and the time to read the
> current i/o
>
>> Normally the driver orders the reads to minimize head movement
>> and loss with rotation latency. Where can  we improve that?
>
> no way to improve it, it´s very good! but per hard disk, not per mirror
> but since we know it´s busy we can use another mirror (another disk
> with same information), that´s what i want
>
>> Also, what about conflicts with the elevator algorithm?
> elevator are based on model of disk, think disk as: linux elevator +
> NCQ + disks, the sum of three infomration give us time based
> infomrations to select best device
> maybe making complex code (per elevator) we could know the time spent
> to execute it, but it´s a lot of work,
> for the first model, lets think about parameters of our model (linux
> elevator + ncq + disks)
> a second version we could implement elevator algorithm time
> calculation (network block device NBD, have a elevator? at server side
> + tcp/ip stack at client and server side, right?)
>
>> There are several scheduling algorithms available, and each has
>> its merits. Will your new scheme work against these?
>> Or is your new scheme just another scheduling algorithm?
>
> it´s a scheduling for mirrors
> round balance is a algorithm for mirror
> closest head is a algorithm for mirror
> my 'new' algorith will be for mirror (if anyone help me coding for
> linux kernel hehehe, i didn´t coded for linux kernel yet, just for
> user space)
>
> noop, deadline, cfq isn´t for mirror, these are for raid0 problem
> (linear, stripe if you hard disk have more then one head on your hard
> disk)
>
>> I think I learned that scheduling is per drive, not per file system.
> yes, you learned right! =)
> /dev/md0 (raid1) is a device with scheduling (closest head,round robin)
> /dev/sda is a device with scheduling (noop, deadline, cfq, others)
> /dev/sda1 is a device with scheduling (it send all i/o directly to /dev/sda)
>
> the new algorithm is just for mirrors (raid1), i dont remeber about
> raid5,6 if they are mirror based too, if yes they could be optimized
> with this algorithm too
>
> raid0 don´t have mirrors, but information is per device striped (not
> for linear), that´s why it can be faster... can make parallel reads
>
> with closest head we can´t use best disk, we can use a single disk all
> time if it´s head closer, maybe it´s not the fastest disk (that´s why
> we implent the write-mostly, we don´t make they usable for read, just
> for write or when mirror fail, but it´s not perfect for speed, a
> better algorithm can be made, for identical disks, a round robin work
> well, better than closest head if it´s a solid state disk)
> ok on a high load, maybe closest mirror is better than this algorithm?
> yes, if you just use hard disk, if you mix hard disk+solid
> state+network block device +floppy disks+any other device, you don´t
> have the best algorithm for i/o over mirrors
>
>
>> and is it reading or writing or both? Normally we are dependant on the
>> reading, as we cannot process data before we have read them.
>> OTOH writing is less time critical, as nobody is waiting for it.
> it must be implemented on write and read, write for just time
> calculations, read for select the best mirror
> for write we must write on all mirrors (sync write is better, async
> isn´t power fail safe)
>
>> Or is it maximum thruput you want?
>> Or a mix, given some restraints?
> it´s the maximum performace = what´s the better strategy to spent less
> time to execute current i/o, based on time to access disk, time to
> read bytes, time to wait others i/o being executed
>
> that´s for mirror select, not for disks i/o
> for disks we can use noop, deadline, cfq scheduller (for disks)
> tcp/ip tweaks for network block device
>
> a model identification must execute to tell the mirror select
> algorithm what´s the model of each device
> model: time to read X bytes, time to move head, time to start a read,
> time to write, time time time per byte per kb per units
> calcule time and select the minimal value calculated as the device
> (mirror) to execute our read
>
>
>>
>> best regards
>> keld
>
> thanks keld
>
> sorry if i make email list very big
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html