Howdy Brian, thanks for the feedback. I knew I forgot something. All of these servers are connected via a single 10Gb Ethernet link. The clients are mostly going to attach via 1Gb. Thanks for the tip on the RAM, I'll put that in our configuration. The systems were provisioned to serve as both storage nodes and condor compute nodes. Testing will tell us whether or not the storage servers can also perform compute processes without affecting the performance of the filesystem. I'll share my configuration notes here in case anyone in the future stumbles on this and is interested in provisioning the storage via the command line and Dell's omconfig tool. The R720xd's came preconfigured with a single large RAID6 virtual disk for the data drives. The following will remove that configuration. 1. First identify the virtual disk id # omreport storage vdisk controller=0 ID : 1 Status : Ok Name : Virtual Disk 1 State : Ready Hot Spare Policy violated : Not Assigned Encrypted : No Layout : RAID-6 Size : 27,940.00 GB (30000346562560 bytes) ... 2. Delete the vdisk # omconfig storage vdisk action=deletevdisk controller=0 vdisk=1 3. List the physical disks on controller 0 # omreport storage pdisk controller=0 |grep ^ID ID : 0:1:0 ID : 0:1:1 ID : 0:1:2 ... ID : 0:1:11 ID : 0:1:12 ID : 0:1:13 4. Now list the physical disks that are assigned to vdisk 0 (the operating system mirror) # omreport storage pdisk controller=0 vdisk=0|grep ^ID ID : 0:1:12 ID : 0:1:13 5. Create 10 virtual disks using pdisks 0:1:0 thru 0:1:11 # for n in {1..10}; do omconfig storage controller action=createvdisk controller=0 \ raid=r6 \ size=2794g \ pdisk=0:1:0,0:1:1,0:1:2,0:1:3,0:1:4,0:1:5,0:1:6,0:1:7,0:1:8,0:1:9,0:1:10,0:1:11 \ stripesize=128kb \ diskcachepolicy=disabled \ readpolicy=ara \ writepolicy=wb \ name=rcs_data${n}; done 6. virtual disks can be listed using: # omreport storage vdisk controller=0 7. The OS lists the disk is vdisk=0, the new vdisks are 1 thru 10. Format the new disks using XFS # unset devs && for n in {1..10}; do devs="$devs $(omreport storage vdisk controller=0 vdisk=$n | grep ^Device | awk '{print $4}')"; done # m=1 && for dev in $devs; do mkfs.xfs -i size=512 -L brick${m} ${dev}; let m=$m+1; done -----Original Message----- From: Brian Candler [mailto:B.Candler at pobox.com] Sent: Sunday, December 02, 2012 1:03 PM To: Mike Hanby Cc: Gluster-users at gluster.org Subject: Re: New GlusterFS Config with 6 x Dell R720xd's and 12x3TB storage On Fri, Nov 30, 2012 at 07:21:54PM +0000, Mike Hanby wrote: > We have the following hardware that we are going to use for a GlusterFS > cluster. > > 6 x Dell R720xd's (16 cores, 96G) Heavily over-specified, especially the RAM. Having such large amounts of RAM can even cause problems if you're not careful. You probably want to use sysctl and /etc/sysctl.conf to set vm.dirty_background_ratio=1 vm.dirty_ratio=5 (or less) so that dirty disk blocks are written to disk sooner, otherwise you may find the system locks up for several minutes at a time as it flushes the enormous disk cache. I use 4 cores + 8GB for bricks with 24 disks (and they are never CPU-bound) > I now need to decide how to configure the 12 x 3TB disks in each > server, followed by partitioning / formatting them in the OS. > > The PERC H710 supports RAID 0,1,5,6,10,50,60. Ideally we'd like to get > good performance, maximize storage capacity and still have parity :-) For performance: RAID10 For maximum storage capacity: RAID5 or RAID6 > * Stripe Element Size: 64, 128, 256, 512KB, 1MB Depends on workload. With RAID10 and lots of concurrent clients, I'd tend to use a 1MB stripe size. Then R/W by one client is likely to be on a different disk to R/W by another client, and although throughput to individual clients will be similar to a single disk, the total throughput is maximised. If your accesses are mostly by a single client, then you may not get enough readahead to saturate the disks with such a large stripe size; with RAID5/6 your writes may be slow if you can't write a stripe at a time (which may be possible if you have a battery-backed card). So for these scenarios something like 256K may work better. But you do need to test it. Finally: you don't mention your network setup. With 12 SATA disks, you can expect to get 25-100MB/sec *per disk* depending on how sequential and how large the transfers are. So your total disk throughput is potentially 12 times that, i.e. 300-1200MB/sec. The bottom end of this range is easily achievable, and is already 2.5 times a 1G link. At the top end you could saturate a 10G link. So if you have only 1G networking it's very likely going to be the bottleneck. Regards, Brian.