before, this thread i put at this page: https://bbs.archlinux.org/viewtopic.php?pid=887267 to make this mail list with less emails 2011/2/2 Keld Jørn Simonsen <keld@xxxxxxxxxx>: > Hmm, Roberto, where are the gains? it´s dificult to talk... NCQ and linux scheduler don´t help a mirror, they help a single device a new scheduler for mirrors can be done (round robin, closest head, others) > I think it is hard to make raid1 better than it is today. i don´t think, since head, is just for hard disk (rotational) not for solid state disks, let´s not talk about ssd, just hard disk? a raid with 5000rpm and 10000rpm disk, we will have better i/o read with 10000rpm ? we don´t know the model of i/o for that device, but probally will be faster, but when it´s busy we could use 5000rpm... that´s the point, just closest head don´t help, we need know what´s the queue (list of i/o being processed) and the time to read the current i/o > Normally the driver orders the reads to minimize head movement > and loss with rotation latency. Where can we improve that? no way to improve it, it´s very good! but per hard disk, not per mirror but since we know it´s busy we can use another mirror (another disk with same information), that´s what i want > Also, what about conflicts with the elevator algorithm? elevator are based on model of disk, think disk as: linux elevator + NCQ + disks, the sum of three infomration give us time based infomrations to select best device maybe making complex code (per elevator) we could know the time spent to execute it, but it´s a lot of work, for the first model, lets think about parameters of our model (linux elevator + ncq + disks) a second version we could implement elevator algorithm time calculation (network block device NBD, have a elevator? at server side + tcp/ip stack at client and server side, right?) > There are several scheduling algorithms available, and each has > its merits. Will your new scheme work against these? > Or is your new scheme just another scheduling algorithm? it´s a scheduling for mirrors round balance is a algorithm for mirror closest head is a algorithm for mirror my 'new' algorith will be for mirror (if anyone help me coding for linux kernel hehehe, i didn´t coded for linux kernel yet, just for user space) noop, deadline, cfq isn´t for mirror, these are for raid0 problem (linear, stripe if you hard disk have more then one head on your hard disk) > I think I learned that scheduling is per drive, not per file system. yes, you learned right! =) /dev/md0 (raid1) is a device with scheduling (closest head,round robin) /dev/sda is a device with scheduling (noop, deadline, cfq, others) /dev/sda1 is a device with scheduling (it send all i/o directly to /dev/sda) the new algorithm is just for mirrors (raid1), i dont remeber about raid5,6 if they are mirror based too, if yes they could be optimized with this algorithm too raid0 don´t have mirrors, but information is per device striped (not for linear), that´s why it can be faster... can make parallel reads with closest head we can´t use best disk, we can use a single disk all time if it´s head closer, maybe it´s not the fastest disk (that´s why we implent the write-mostly, we don´t make they usable for read, just for write or when mirror fail, but it´s not perfect for speed, a better algorithm can be made, for identical disks, a round robin work well, better than closest head if it´s a solid state disk) ok on a high load, maybe closest mirror is better than this algorithm? yes, if you just use hard disk, if you mix hard disk+solid state+network block device +floppy disks+any other device, you don´t have the best algorithm for i/o over mirrors > and is it reading or writing or both? Normally we are dependant on the > reading, as we cannot process data before we have read them. > OTOH writing is less time critical, as nobody is waiting for it. it must be implemented on write and read, write for just time calculations, read for select the best mirror for write we must write on all mirrors (sync write is better, async isn´t power fail safe) > Or is it maximum thruput you want? > Or a mix, given some restraints? it´s the maximum performace = what´s the better strategy to spent less time to execute current i/o, based on time to access disk, time to read bytes, time to wait others i/o being executed that´s for mirror select, not for disks i/o for disks we can use noop, deadline, cfq scheduller (for disks) tcp/ip tweaks for network block device a model identification must execute to tell the mirror select algorithm what´s the model of each device model: time to read X bytes, time to move head, time to start a read, time to write, time time time per byte per kb per units calcule time and select the minimal value calculated as the device (mirror) to execute our read > > best regards > keld thanks keld sorry if i make email list very big -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html