i have updated again, some questions are being explained (https://bbs.archlinux.org/viewtopic.php?pid=887345) check that this question (optional io mirror scheduler algorithm) is very old (1+1/2 years, Chris Worley [ Fr, 16 Oktober 2009 21:07 ] [ ID #2019215 ]) http://www.issociate.de/board/post/499463/Load-balancing_mirrors_w/_asymmetric_performance.html 2011/2/2 Roberto Spadim <roberto@xxxxxxxxxxxxx>: > nice, i don´t know if it´s a problem of single thread > i think it´s a problem about async read command being executed in parallel > i post again at https://bbs.archlinux.org/viewtopic.php?pid=887345 > please see the history at the end of page > i´m talking about a disk with 5000rpm and a disk with 7000rpm > i think we can optimize mirror read algorithm and it´s not very hard > for same speed hard disk, near mirror is good > for same speed solid state, round robin is good > for anyone, time based is good > > diferences? > hard disk: time to position head is high, time to read can be small > solid state: time to position is small, time to read is small (some > ssd are old, and have small read rate) > nbd: time based on server hard/solid disk, and network time, but don´t > think in nbd yet > > 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>: >> Hmm, Roberto, I think we are close to theoretical maximum with >> some of the raid1/raid10 stuff already. and my nose tells me >> that we can gain more by minimizing CPU usage. >> Or maybe using some threading for raid modules - they >> all run single-threaded. >> >> Best regards >> keld >> >> >> On Wed, Feb 02, 2011 at 06:28:27PM -0200, Roberto Spadim wrote: >>> before, this thread i put at this page: >>> https://bbs.archlinux.org/viewtopic.php?pid=887267 >>> to make this mail list with less emails >>> >>> 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>: >>> > Hmm, Roberto, where are the gains? >>> >>> it?s dificult to talk... NCQ and linux scheduler don?t help a mirror, >>> they help a single device >>> a new scheduler for mirrors can be done (round robin, closest head, others) >>> >>> > I think it is hard to make raid1 better than it is today. >>> i don?t think, since head, is just for hard disk (rotational) not for >>> solid state disks, let?s not talk about ssd, just hard disk? a raid >>> with 5000rpm and 10000rpm disk, we will have better i/o read with >>> 10000rpm ? we don?t know the model of i/o for that device, but >>> probally will be faster, but when it?s busy we could use 5000rpm... >>> that?s the point, just closest head don?t help, we need know what?s >>> the queue (list of i/o being processed) and the time to read the >>> current i/o >>> >>> > Normally the driver orders the reads to minimize head movement >>> > and loss with rotation latency. Where can we improve that? >>> >>> no way to improve it, it?s very good! but per hard disk, not per mirror >>> but since we know it?s busy we can use another mirror (another disk >>> with same information), that?s what i want >>> >>> > Also, what about conflicts with the elevator algorithm? >>> elevator are based on model of disk, think disk as: linux elevator + >>> NCQ + disks, the sum of three infomration give us time based >>> infomrations to select best device >>> maybe making complex code (per elevator) we could know the time spent >>> to execute it, but it?s a lot of work, >>> for the first model, lets think about parameters of our model (linux >>> elevator + ncq + disks) >>> a second version we could implement elevator algorithm time >>> calculation (network block device NBD, have a elevator? at server side >>> + tcp/ip stack at client and server side, right?) >>> >>> > There are several scheduling algorithms available, and each has >>> > its merits. Will your new scheme work against these? >>> > Or is your new scheme just another scheduling algorithm? >>> >>> it?s a scheduling for mirrors >>> round balance is a algorithm for mirror >>> closest head is a algorithm for mirror >>> my 'new' algorith will be for mirror (if anyone help me coding for >>> linux kernel hehehe, i didn?t coded for linux kernel yet, just for >>> user space) >>> >>> noop, deadline, cfq isn?t for mirror, these are for raid0 problem >>> (linear, stripe if you hard disk have more then one head on your hard >>> disk) >>> >>> > I think I learned that scheduling is per drive, not per file system. >>> yes, you learned right! =) >>> /dev/md0 (raid1) is a device with scheduling (closest head,round robin) >>> /dev/sda is a device with scheduling (noop, deadline, cfq, others) >>> /dev/sda1 is a device with scheduling (it send all i/o directly to /dev/sda) >>> >>> the new algorithm is just for mirrors (raid1), i dont remeber about >>> raid5,6 if they are mirror based too, if yes they could be optimized >>> with this algorithm too >>> >>> raid0 don?t have mirrors, but information is per device striped (not >>> for linear), that?s why it can be faster... can make parallel reads >>> >>> with closest head we can?t use best disk, we can use a single disk all >>> time if it?s head closer, maybe it?s not the fastest disk (that?s why >>> we implent the write-mostly, we don?t make they usable for read, just >>> for write or when mirror fail, but it?s not perfect for speed, a >>> better algorithm can be made, for identical disks, a round robin work >>> well, better than closest head if it?s a solid state disk) >>> ok on a high load, maybe closest mirror is better than this algorithm? >>> yes, if you just use hard disk, if you mix hard disk+solid >>> state+network block device +floppy disks+any other device, you don?t >>> have the best algorithm for i/o over mirrors >>> >>> >>> > and is it reading or writing or both? Normally we are dependant on the >>> > reading, as we cannot process data before we have read them. >>> > OTOH writing is less time critical, as nobody is waiting for it. >>> it must be implemented on write and read, write for just time >>> calculations, read for select the best mirror >>> for write we must write on all mirrors (sync write is better, async >>> isn?t power fail safe) >>> >>> > Or is it maximum thruput you want? >>> > Or a mix, given some restraints? >>> it?s the maximum performace = what?s the better strategy to spent less >>> time to execute current i/o, based on time to access disk, time to >>> read bytes, time to wait others i/o being executed >>> >>> that?s for mirror select, not for disks i/o >>> for disks we can use noop, deadline, cfq scheduller (for disks) >>> tcp/ip tweaks for network block device >>> >>> a model identification must execute to tell the mirror select >>> algorithm what?s the model of each device >>> model: time to read X bytes, time to move head, time to start a read, >>> time to write, time time time per byte per kb per units >>> calcule time and select the minimal value calculated as the device >>> (mirror) to execute our read >>> >>> >>> > >>> > best regards >>> > keld >>> >>> thanks keld >>> >>> sorry if i make email list very big >>> >>> >>> >>> -- >>> Roberto Spadim >>> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html