Re: high throughput storage server?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 15, 2011 at 9:16 AM, Joe Landman <joe.landman@xxxxxxxxx> wrote:
> [disclosure: vendor posting, ignore if you wish, vendor html link at bottom
> of message]
>
>> The whole system needs to be "fast".
>
> Define what you mean by "fast".  Seriously ... we've had people tell us
> about their "huge" storage needs that we can easily fit onto a single small
> unit, no storage cluster needed.  We've had people say "fast" when they mean
> "able to keep 1 GbE port busy".
>
> Fast needs to be articulated really in terms of what you will do with it.
>  As you noted in this and other messages, you are scaling up from 10 compute
> nodes to 40 compute nodes.  4x change in demand, and I am guessing bandwidth
> (if these are large files you are streaming) or IOPs (if these are many
> small files you are reading).  Small and large here would mean less than
> 64kB for small, and greater than 4MB for large.

These are definitely large files; maybe "huge" is a better word.  All
are over 100 MB in size, some are upwards of 5 GB, most are probably a
few hundred megs in size.

The word "streaming" may be accurate, but to me it is misleading. I
associate streaming with media, i.e. it is generally consumed much
more slowly than it can be sent (e.g. even high-def 1080p video won't
saturate a 100 mbps link).  But in our case, these files are basically
read into memory, and then computations are done from there.

So, for an upper bounds on the notion of "fast", I'll illustrate the
worst-case scenario: there are 50 analysis machines, each of which can
run up to 10 processes, making 500 total processes.  Every single
process requests a different file at the exact same time, and every
requested file is over 100 MB in size.  Ideally, each process would be
able to access the file as though it were local, and was the only
process on the machine.  In reality, it's "good enough" if each of the
50 machines' gigabit network connections are saturated.  So from the
network perspective, that's 50 gbps.

>From the storage perspective, it's less clear to me.  That's 500 huge
simultaneous read requests, and I'm not clear on what it would take to
satisfy that.

> Your choice is simple.  Build or buy.  Many folks have made suggestions, and
> some are pretty reasonable, though a pure SSD or Flash based machine, while
> doable (and we sell these), is quite unlikely to be close to the realities
> of your budget.  There are use cases for which this does make sense, but the
> costs are quite prohibitive for all but a few users.

Well, I haven't decided on whether or not to build or buy, but the
thought experiment of planning a buy is very instructive.  Thanks to
everyone who has contributed to this thread, I've got more information
than I've been able to digest so far!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux