Re: How to best use 32 15k.7 300GB drives?

david@xxxxxxx · Thu, 27 Jan 2011 15:19:32 -0800 (PST)

On Thu, 27 Jan 2011, Robert Schnabel wrote:

HI,

I use PostgreSQL basically as a data warehouse to store all the genetic data 
that our lab generates.  The only person that accesses the database is myself 
and therefore I've had it housed on my workstation in my office up till now. 
However, it's getting time to move it to bigger hardware.  I currently have a 
server that is basically only storing backup images of all our other 
workstations so I'm going to move my database onto it.  The server looks like 
this: Windows Server Enterprise 2008 R2 64-bit, AMD 2350 quad-core x2, 32GB 
RAM.  For my purposes the CPUs and RAM are fine.  I currently have an Adaptec 
52445+BBU controller that has the OS (4 drive RAID5), FTP (4 drive RAID5) and 
two backup arrays (8 drive each RAID0).  The backup arrays are in a 16 drive 
external enclosure through an expander so I actually have 16 ports free on 
the 52445 card.  I plan to remove 3 of the drives from my backup arrays to 
make room for 3 - 73GB 15k.5 drives (re-purposed from my workstation).  Two 
16 drive enclosures with SAS2 expanders just arrived as well as 36 Seagate 
15k.7 300GB drives (ST3300657SS).  I also intend on getting an Adaptec 6445 
controller with the flash module when it becomes available in about a month 
or two.  I already have several Adaptec cards so I'd prefer to stick with 
them.

Here's the way I was planning using the new hardware:
xlog & wal: 3 - 73G 15k.5 RAID1+hot spare in enclosure A on 52445 controller
data: 22 - 300G 15k.7 RAID10 enclosure B&C on 6445 controller
indexes: 8 - 300G 15k.7 RAID10 enclosure C on 6445 controller
2 - 300G 15k.7 as hot spares enclosure C
4 spare 15k.7 for on the shelf

With this configuration I figure I'll have ~3TB for my main data tables and 
1TB for indexes.  Right now my database is 500GB total.  The 3:1 split 
reflects my current table structure and what I foresee coming down the road 
in terms of new data.

So my questions are 1) am I'm crazy for doing this, 2) would you change 
anything and 3) is it acceptable to put the xlog & wal (and perhaps tmp 
filespace) on a different controller than everything else?  Please keep in 
mind I'm a geneticist who happens to know a little bit about bioinformatics 
and not the reverse. :-)

a number of questions spring to mind

how much of the time are you expecting to spend inserting data into this 
system vs querying data from the system?

is data arriving continuously, or is it a matter of receiving a bunch of 
data, inserting it, then querying it?

which do you need to optimize for, insert speed or query speed?

do you expect your queries to be searching for a subset of the data 
scattered randomly throughlut the input data, or do you expect it to be 
'grab this (relativly) sequential chunk of input data and manipulate it to 
generate a report' type of thing

what is your connectvity to the raid enclosures? (does 
putting 22 drives on one cable mean that you will be limited due to the 
bandwidth of this cable rather than the performance of the drives)

can you do other forms of raid on these drives or only raid 10?

how critical is the data in this database? if it were to die would it just 
be a matter of recreating it and reloading the data? or would you loose 
irreplaceable data?

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance