Re: high throughput storage server?

Joe Landman <joe.landman@xxxxxxxxx> · Mon, 28 Feb 2011 10:46:06 -0500

On 02/27/2011 04:30 PM, Ed W wrote:

[...]

It would appear that you can use a much lower powered system to
basically push jobs out to the processing machines in advance, this way
your bandwidth basically only needs to be:
size_of_job * num_machines / time_to_process_jobs

This would be good.  Matt's original argument suggested he needed this 
as his sustained bandwidth given the way the analysis proceeded.

If we assume that the processing time is T_p, and the communication time 
is T_c, ignoring other factors, the total time for 1 job is T_j = T_p + 
T_c.  If T_c << T_p, then you can effectively ignore bandwidth related 
issues (and use a much smaller bandwidth system).  For T_c << T_p, lets 
(for laughs) say T_c = 0.1 x T_p (e.g. communication time is 1/10th the 
processing time).  Then even if you halved your bandwidth, and doubled 
T_c, you are making only an about 10% increase in your total execution 
time for a job.

With Nmachines each with Ncores, you have Nmachines x Ncores jobs going 
on all at once. If T_c << T_p (as in the above example), then most of 
the time, on average, the machines will not be communicating.  In fact, 
if we do a very rough first pass approximation to an answer (there are 
more accurate statistical models) for this, one would expect the network 
to be used T_c/T_p fraction of the time by each process.  Then the total 
consumption of data for a run (assuming all runs are *approximately* of 
equal duration)

	D = B x T_c

D being the amount of data in MB or GB, and B being the bandwidth 
expressed in MB/s or GB/s.  Your effective bandwidth per run, Beff will be

	D = Beff x T = Beff x (T_c + T_p)

For Nmachines x Ncores jobs, Dtotal is the total data transfered

	Dtotal	= Nmachines x Ncores * D = Nmachines x Ncores x Beff
  		x (T_c + T_p)

You know Dtotal (aggregate data needed for run).  You know Nmachines and 
Ncores.  You know T_c and T_p (approximately).  From this, solve for 
Beff.  Thats what you have to sustain (approximately).

So if the time to process jobs is significant then you have quite some
time to push out the next job to local storage ready?

Firstly is this architecture workable? If so then you have some new
performance parameters to target for the storage architecture?

Good luck

Ed W

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@xxxxxxxxxxxxxxxxxxxxxxx
web  : http://scalableinformatics.com
       http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html