Your application appears to be an implementation of a queue processing
system? ie each machine: pulls a file down, processes it, gets the next
file, etc?
Can you share some information on
- the size of files you pull down (I saw something in another post)
- how long each machine takes to process each file
- whether there is any dependency between the processing machines? eg
can each machine operate completely independently of the others and
start it's job when it wishes (or does it need to sync?)
Given the tentative assumption that
- processing each file takes many multiples of the time needed to
download the file, and
- files are processed independently
It would appear that you can use a much lower powered system to
basically push jobs out to the processing machines in advance, this way
your bandwidth basically only needs to be:
size_of_job * num_machines / time_to_process_jobs
So if the time to process jobs is significant then you have quite some
time to push out the next job to local storage ready?
Firstly is this architecture workable? If so then you have some new
performance parameters to target for the storage architecture?
Good luck
Ed W
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html