On 02/24/2011 03:58 PM, Matt Garman wrote:
These are definitely large files; maybe "huge" is a better word. All are over 100 MB in size, some are upwards of 5 GB, most are probably a few hundred megs in size.
Heh ... the "huge" storage I alluded to above is also quite ... er ... context sensitive.
The word "streaming" may be accurate, but to me it is misleading. I
Actually not at all. We have quite a few customers that consume files by slurping them into ram before processing. So the file system streams (e.g. sends data as fast as the remote process can consume it, modulo network and other inefficiencies).
associate streaming with media, i.e. it is generally consumed much more slowly than it can be sent (e.g. even high-def 1080p video won't saturate a 100 mbps link). But in our case, these files are basically read into memory, and then computations are done from there.
Same use case. dd is an example of a "trivial" streaming app, though we prefer to generate load with fio.
So, for an upper bounds on the notion of "fast", I'll illustrate the worst-case scenario: there are 50 analysis machines, each of which can run up to 10 processes, making 500 total processes. Every single process requests a different file at the exact same time, and every requested file is over 100 MB in size. Ideally, each process would be able to access the file as though it were local, and was the only process on the machine. In reality, it's "good enough" if each of the 50 machines' gigabit network connections are saturated. So from the network perspective, that's 50 gbps.
Ok, so if we divide these 50 Gbps across say ... 10 storage nodes ... then we need only sustain, on average, 5 Gbps/storage node. This makes a number of assumptions, some of which are valid (e.g. file distribution across nodes is effectively random, and can be accomplished via parallel file system). 5 Gbps/storage node sounds like a node with 6x GbE ports, or 1x 10GbE port. Run one of the parallel file systems across it and make sure the interior RAID can handle this sort of bandwidth (you'd need at least 700 MB/s on the interior RAID, which eliminates many/most of the units on the market, and you'd need pretty high efficiencies in the stack, which also have a tendency to reduce your choices ... better to build the interior RAIDs as fast as possible, deal with the network efficiency losses and call it a day)
All this said, its better to express your IO bandwidth needs in MB/s, preferably in terms of sustained bandwidth needs, as this is language that you'd be talking to vendors in. So on 50 machines, assume each machine can saturate its 1GbE port (these aren't Broadcom NICs, right?), that gets you 50x 117 MB/s or about 5.9 GB/s sustained bandwidth for your IO. 10 machines running at a sustainable 600 MB/s delivered over the network, and a parallel file system atop this, solves this problem.
Single centralized resources (FC heads, filers, etc.) won't scale to this. Then again, this isn't their use case.
Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman@xxxxxxxxxxxxxxxxxxxxxxx web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html