Ben Suffolk wrote:
Hello all,
I am currently working out the best type of machine for a high volume
pgsql database that I going to need for a project. I will be purchasing
a new server specifically for the database, and it won't be running any
other applications. I will be using FreeBSD 6.1 Stable.
I think it may be beneficial if I give a brief overview of the types of
database access. There are several groups of tables and associated
accesses to them.
The first can be thought of as users details and configuration tables.
They will have low read and write access (say around 10 - 20 a min).
SIzed at around 1/2 Million rows.
The second part is logging, this will be used occasionally for reads
when reports are run, but I will probably back that off to more
aggregated data tables, so can probably think of this as a write only
tables. Several table will each have around 200-300 inserts a second.
The can be archived on a regular basis to keep the size down, may be
once a day, or once a week. Not sure yet.
> The third part will be transactional and will have around 50
> transaction a second. A transaction is made up of a query followed by
> an update, followed by approx 3 inserts. In addition some of these
> tables will be read out of the transactions at approx once per second.
>
There will be around 50 simultaneous connections.
>
> I hope that overview is a) enough and b) useful background to this
> discussion.
Sounds like you have a very good idea of what to expect. Are these solid
stats or certain estimates? Estimates can vary when it comes time to start.
Processor : I understand that pgsql is not CPU intensive, but that each
connection uses its own process. The HW has an option of upto 4 dual
core xeon processors. My thoughts would be that more lower spec
processors would be better than fewer higher spec ones. But the question
is 4 (8 cores) wasted because there will be so much blocking on I/O. Is
2 (4 cores) processors enough. I was thinking 2 x 2.6G dual core Xeons
would be enough.
I would think 2 will cope with what you describe but what about in 12
months time? Can you be sure your needs won't increase? And will the
cost of 4 CPU's cut your other options? If all 50 users may be running
the 3rd part at the same time (or is that your 50 trans. a second?) then
I'd consider the 4.
Memory : I know this is very important for pgsql, and the more you have
the more of the tables can reside in memory. I was thinking of around 8
- 12G, but the machine can hold a lot more. Thing is memory is still
quite expensive, and so I don't to over spec it if its not going to get
used.
8GB is a good starting point for a busy server but a few hundred $ on
the extra ram can make more difference than extra disks (more for the
reading part than writing).
What you describe plans several times 300 inserts to logging plus 150
inserts and 50 updates and 1 read a second plus occasional reads to the
logging and user data.
Will it be raw data fed in and saved or will the server be calculating a
majority of the inserted data? If so go for the 4 cpu's.
Again allow room for expansion.
Disk : Ok so this is the main bottleneck of the system. And the thing I
know least about, so need the most help with. I understand you get good
improvements if you keep the transaction log on a different disk from
the database, and that raid 5 is not as good as people think unless you
have lots of disks.
My option in disks is either 5 x 15K rpm disks or 8 x 10K rpm disks (all
SAS), or if I pick a different server I can have 6 x 15K rpm or 8 x 10K
rpm (again SAS). In each case controlled by a PERC 5/i (which I think is
an LSI Mega Raid SAS 8408E card).
So the question here is will more disks at a slower speed be better than
fewer disks as a higher speed?
Generally more disks at slower speed - 2 10K disks in raid 0 is faster
than 1 15K disk. More disks also allow more options.
Choosing the best RAID controller can make a lot of difference too.
Assuming I was going to have a mirrored pair for the O/S and transaction
logs that would leave me with 3 or 4 15K rpm for the database, 3 would
mean raid 5 (not great at 3 disks), 4 would give me raid 10 option if I
wanted it. Or I could have raid 5 across all 5/6 disks and not separate
the transaction and database onto different disks. Better performance
from raid 5 with more disks, but does having the transaction logs and
database on the same disks counteract / worsen the performance?
If I had the 8 10K disks, I could have 2 as a mirrored pair for O/S
Transaction, and still have 6 for raid 5. But the disks are slower.
I might consider RAID 5 with 8 disks but would lean more for 2 RAID 10
setups. This can give you the reliability and speed with system and xlog
on one and data on the other.
Sounds to me like you have it worked out even if you are a little
indecisive on a couple of finer points.
--
Shane Ambler
Postgres@xxxxxxxxxxxxxxxx
Get Sheeky @ http://Sheeky.Biz