Re: [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Mon, 2 Jul 2012 00:57:56 -0300

hummm well that´s true... exist a queue inside disk hardware that we
can´t measure... but... if you want i can make tests to you :)
i used a configuration a bit diferent some time ago, instead of a SSD
and a harddisk, i used a disk with 7200rpm and a disk with 15000 the
"time based" algorithm runs nice in this case, maybe could give just a
little more 'performace' (maybe none), like i told the mean performace
that i got was 1% (i made tests with different disks speed and
ssd+disks, i had a ocz vortex2, a sata 7200rpm (500gb) and a sas
15000rpm (142gb), some other guy here in kernel list tested too, but
they didn´t confirmed if the performace was a mean performace or just
a 'error' in measure

when i done this i got some 'empirical' values to 'tune' the
algorithm, i don´t remember all 'theory' but i done something like
this:

1)  (distance * time/distance unit)
time/distance unit,
    i don´t remember distance unit, i think it´s 1 block =  512bytes
right? well, just check the idea...
    for disks:
        total disk capacity in distance units / 1 revolution time
        1 revolution time = 1/rpm for disk, for example
              7200 rpm => 120 hz => 8.333ms = 8333us (near 10ms like
told in disk spec of random acess time)
              15000 rpm => 250hz => 4ms = 4000us (near 5ms like told
in disk spec)
    for ssd : 0 seconds
	7200 => 500gb (1024*1024*1024/512) / 8333 =   1048576000blocks /
8333us = 0.000'007'946'968'078 block/us
	15000 => 142gb (1024*1024*1024/512) / 4000us = 297795584blocks /
4000us = 0.000'013'432'032'625 block/us
	ssd => infinite blocks/us
		0.000007946 for 7200rpm,
		0.000013432 for 15000rpm,
		0 for ssd

2)(blocks to read/write * time to read/write 1 block)
 this part i put dd to work...
  dd if=/dev/sda of=/dev/null (there was some flags to remove cache
too but don´t remember now...)
   and used iostat -d 1 -k to get mean read performace
 i don´t remember the rights numbers but they was something near this:
    ssd - 230mb/s  = 230Mb(1024*1024)/512bytes => 471040 blocks /
second =  0.000'002'122 => 2.122us / block
    hd 7200 - 120mb/s => 245760 blocks/second => 0.000'004'069 =>
4.069us / block
    hd 15000 - 170mb/s => 348160 blocks/second => 0.000'002'872 =>
2.872us / block

3) (non sequencial penalty time)
here i used two dd to do this (some seconds between first and second dd)
and got the new mb/s values
ssd get a bit down but not much 230 -> 200
hd 7200 120mb -> 90
hd 15000 170 -> 150

with this loses i done a 'penalty' value
(230-200)/230 = 13.043%
(120-90)/120 = 25%
(170-150)/170 = 11.76%

i don´t remember if i used the penalty with distance=0, or if i used
it like in today implementation that select the previous disk when
reading the full md device

======
with this numbers.... some algorithms expected selects...
sda=ssd, sdb=15000rpm, sdc=7200rpm

sda|sdb|sdc
disk positions: 0 | 0 | 0
read 100 block at position 20000...
sda=> distance = 20000, extimate time = 20000*0 + 2.122*100 + 13.043%
		in other words...
			(        0 + 212.2) * 1.13043 = 239.877246
sdb=> distance = 20000, extimate time = 20000*0.000013432 + 2.872*100
+ 11.76% =
			(0.26864 + 287.2) * 1.1176 = 321.274952064
sdc=> distance = 20000, extimate time = 20000*0.000007946 + 4.069*100 + 25% =
			(0.15892 + 406.9) * 1.25 = 508.82365
	HERE WE SELECT sda (239.877)

disk positions: 200 | 0 | 0
read 100 blocks at position 0...
sda=> distance = 200, extimate time = 200*0 + 2.122*100 + 13.043%
			(        0 + 212.2) * 1.13043 = 239.877246
sdb=> distance = 0, extimate time = 0*0.000013432 + 2.872*100 + 0% =
	(no penalty here since we are at the right place)
			(        0 + 287.2) * 1 = 287.2
sdc=> distance = 0, extimate time = 0*0.000007946 + 4.069*100 + 0% =
			(        0 + 406.9) * 1 = 406.9
	sda...
	check that i will always select sda... since it´s fast for distance
(0seconds) and have the highets transfer rate

that´s here my algorithm didn´t worked fine... (i don´t know anything
about past and queue just the current read)

but now... with someone that know the kernel code... we have this
information of pendings requests =D

i think we can go inside queue and calculate the total estimate time =), or not?
	for each pending request we should calculate this times... and sum
the total time to select the 'best' disk
	here i didn´t coded since i don´t know how to get information from
queue in kernel =( and my hobby ended ='(

thanks to read....
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html