Re: Improving real world performance by moving files closer to their target workloads

"Luke McGregor" <luke@xxxxxxxxxxxxxxx> · Sat, 17 May 2008 00:15:19 +1200

ok so basicly a file lookup works by broadcasting to the network and anyone
who has the file replies, the location is then cached locally and the next
time the file is needed no broadcast is needed as it is in local cache?
correct? The reason we dont really want to go with caching as a final
solution is that it wont reduce toward an optimal solution over time,
ideally files that are commonly used should no longer be located in nodes
that dont use them, instead they should be located locally. Well thats the
theory anyway, if this isnt the case i think it may still be useful doing
the work to prove the point that it doesnt provide any great benefit.

Im a little worried that there might be a sticking point with the current
lookup scheme if there are multiple copies however. Im not quite sure how to
get around consistancy if you want to guarentee that every write accesses
the most recent copy only. I can see this as a serious problem. Im not too
sure how to get around it but i will have a think about it.

I personally think that they would get a better performance benefit by
breaking the file down into small pieces and spreading them over the network
to get better read performance from the network as there are more hosts
doing small amounts of disk IO each, i suppose this is similar to your
stripe? However the acedemics in the department all seem very sold on the
migration idea. personally i come from a RAID/SAN background.

Thanks
Luke