Re: high throughput storage server?

Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> · Mon, 21 Feb 2011 15:51:17 -0600

Mattias Wadenstein put forth on 2/21/2011 4:25 AM:
> On Fri, 18 Feb 2011, Stan Hoeppner wrote:
> 
>> Mattias Wadenstein put forth on 2/18/2011 7:49 AM:
>>
>>> Here you would either maintain a large list of nfs mounts for the read
>>> load, or start looking at a distributed filesystem. Sticking them all
>>> into one big fileserver is easier on the administration part, but
>>> quickly gets really expensive when you look to put multiple 10GE
>>> interfaces on it.
>>
>> This really depends on one's definition of "really expensive".  Taking
>> the total cost of such a system/infrastructure into account, these two
>> Intel dual port 10 GbE NICs seem rather cheap at $650-$750 USD:
>>
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16833106037
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16833106075
>>
>> 20 Gb/s (40 both ways) raw/peak throughput at this price seems like a
>> bargain to me (plus the switch module cost obviously, if required,
>> usually not for RJ-45 or CX4, thus my motivation for mentioning these).
>>
>> The storage infrastructure on the back end required to keep these pipes
>> full will be the "really expensive" piece.
> 
> Exactly my point, a storage server that can sustain 20-200MB/s is rather
> cheap, but one that can sustain 2GB/s is really expensive. Possibly to
> the point where 10-100 smaller file servers are much cheaper. The worst
> case here is very small random reads, and then you're screwed cost-wise
> whatever you choose, if you want to get the 2GB/s number.

"Screwed" may be a bit harsh, but I agree that one big fast storage
server will usually cost more than many smaller ones with equal
aggregate performance.  But looking at this from a TCO standpoint, the
administrative burden is higher for the many small case, and file layout
can be problematic, specifically in the case where all analysis nodes
need to share a file or group of files.  This can create bottlenecks at
individual storage servers.  Thus, acquisition cost must be weighed
against operational costs.  If any of the data is persistent, backing up
a single server is straight forward.  Backing up multiple servers, and
restoring them if necessary, is more complicated.

>> RAID 5/6 need not apply due the abysmal RMW partial stripe write
>> penalty, unless of course you're doing almost no writes.  But in that
>> case, how did the data get there in the first place? :)

> Actually, that's probably the common case for data analysis load. Lots
> of random reads, but only occasional sequential writes when you add a
> new file/fileset. So raid 5/6 performance-wise works out pretty much as
> a stripe of n-[12] disks.

RAID5/6 have decent single streaming read performance, but sub optimal
random read, less than sub optimal streaming write, and abysmal random
write performance.  They exhibit poor random read performance with high
client counts when compared to RAID0 or RAID10.  Additionally, with an
analysis "cluster" designed for overall high utilization (no idle
nodes), one node will be uploading data sets while others are doing
analysis.  Thus you end up with a mixed simultaneous random read and
streaming write workload on the server.  RAID10 will give many times the
throughput in this case compared to RAID5/6, which will bog down rapidly
under such a workload.

-- 
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html