On 02/27/2011 04:30 PM, Ed W wrote:
[...]
It would appear that you can use a much lower powered system to
basically push jobs out to the processing machines in advance, this way
your bandwidth basically only needs to be:
size_of_job * num_machines / time_to_process_jobs
This would be good. Matt's original argument suggested he needed this
as his sustained bandwidth given the way the analysis proceeded.
If we assume that the processing time is T_p, and the communication time
is T_c, ignoring other factors, the total time for 1 job is T_j = T_p +
T_c. If T_c << T_p, then you can effectively ignore bandwidth related
issues (and use a much smaller bandwidth system). For T_c << T_p, lets
(for laughs) say T_c = 0.1 x T_p (e.g. communication time is 1/10th the
processing time). Then even if you halved your bandwidth, and doubled
T_c, you are making only an about 10% increase in your total execution
time for a job.
With Nmachines each with Ncores, you have Nmachines x Ncores jobs going
on all at once. If T_c << T_p (as in the above example), then most of
the time, on average, the machines will not be communicating. In fact,
if we do a very rough first pass approximation to an answer (there are
more accurate statistical models) for this, one would expect the network
to be used T_c/T_p fraction of the time by each process. Then the total
consumption of data for a run (assuming all runs are *approximately* of
equal duration)
D = B x T_c
D being the amount of data in MB or GB, and B being the bandwidth
expressed in MB/s or GB/s. Your effective bandwidth per run, Beff will be
D = Beff x T = Beff x (T_c + T_p)
For Nmachines x Ncores jobs, Dtotal is the total data transfered
Dtotal = Nmachines x Ncores * D = Nmachines x Ncores x Beff
x (T_c + T_p)
You know Dtotal (aggregate data needed for run). You know Nmachines and
Ncores. You know T_c and T_p (approximately). From this, solve for
Beff. Thats what you have to sustain (approximately).
So if the time to process jobs is significant then you have quite some
time to push out the next job to local storage ready?
Firstly is this architecture workable? If so then you have some new
performance parameters to target for the storage architecture?
Good luck
Ed W
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@xxxxxxxxxxxxxxxxxxxxxxx
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html