Chuck Ebbert wrote:
Nick Piggin wrote:
OK right. As far as I can see, the algorithm in the RAID1 code
is used to select the best drive to read from? If that is the
case then I don't think it could make better decisions given
more knowledge.
How about if it just asks the elevator whether or not a given read
is a good fit with its current workload? I saw in 2.5 where the balance
code is looking at the number of pending requests and if it's zero then
it sends it to that device. Somehow I think something better than
that could be done, anyway.
That balance code is probably the IDE or SCSI channel balancing?
In that case, the driver simply wants to know which device it
should service next, which is an appropriate fit (is that what
you were talking about? I don't have source here sorry)
We could ask the elevator if a given read is a good fit. It
would probably help.
It seems to me that a better way to layer it would be to have
the complex (ie deadline/AS/CFQ/etc) scheduler handling all
requests into the raid block device, then having a raid
scheduler distributing to the disks, and having the disks
run no scheduler (fifo).
That only works if RAID1 is working at the physical disk level (which
it should be AFAIC but people want flexibility to mirror partitions.)
How so? Basically you want your high level scheduler to run first.
You want it to act on the stream of requests from the system, not
on the stream of requests to the device. If you know what I mean.
I might be wrong here. I haven't done any testing, and only a
little bit of thinking.
In practice the current scheme probably works OK, though I
wouldn't know due to lack of resources here :P
I've been playing with the 2.4 read balance code and have some
improvements, but real gains need a new approach.
The problem I see, is the higher level schedulers (deadline for
example, as opposed to the RAID scheduler) will find it difficult
to tell if a request will be "good" for them or not. For example
we have 2 devices, 100 requests in each scheduler queue.
Device A's head is at sector x and next request is at x+100,
Device B's head is at sector x+10 and next request is at x+200.
RAID wants to know which queue should take a request at sector
x+1000. What do you do?
The way you would do a good "goodness" function, I guess,
would be to search through all requests on the device, and return
the minimum distance from the request you are running the query
on. Do this for both queues, and insert the request into the
queue with the smallest delta. I don't see much else doing any
good.
On the other hand, if you simply have a fifo after the RAID
scheduler, the RAID scheduler itself knows where each disk's
head will end up simply by tracking the value of the last
sector it has submitted to the device. It also has the advantage
that it doesn't have "high level" scheduling stuff below it
ie. request deadline handling, elevator scheme, etc.
This gives the RAID scheduler more information, without
taking any away from the high level scheduler AFAIKS.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html