[Note: This is a repost of a message to the performance list yesterday.
I'm not sure if it didn't go through, or if no one had any
suggestions. In any event, I'll try here. :) ]
Hello List,
Not sure to which list I should post (gray lines, and all that), so
point me in the right direction if'n it's a problem.
I am in the process of learning some of the art/science of benchmarking.
With novnov's recent post about the comparison of MS SQL vs
PostgresQL, I felt it time to do a benchmark comparison of sorts for
myself . . . more for me and the benchmark learning process than the
DB's, but I'm interested in DB's in general, so it's a good fit. (If I
find anything interesting/new, I will of course share the results.)
Given that, I don't know what I'm doing. :| It seems initially that to
do it properly, I have to pick some sort of focus. In other words,
shall I benchmark from a standpoint of ACID compliance? Shall I
benchmark with functionality in mind? Ease of use/setup? Speed? The
latter seems to be done most widely/often, so I suspect it's the easiest
standpoint from which to work. Thus, for my initial foray into
benchmarking, I'll probably start there. (Unless of course, in any of
your wisdom, you can point me in a better direction.)
From my less-than-one-month-of-Postgres-list-lurking, I think I need to
be aware of at /least/ these items for my benchmarks (in no particular
order):
* overall speed (obvious)
* mitigating factors
- DB fits entirely in memory or not (page faults)
- DB size
- DB versions
* DB non-SELECT performance. A common point I see in comparisons of
MySQL and PostgresQL is that MySQL is much faster. However, I rarely
see anything other than comparison of SELECT.
* Query complexity (e.g. criteria, {,inner,outer}-joins, sub-selects)
ex. SELECT * FROM aTable; vs
SELECT
FUNC( var ),
...
FROM
tables
WHERE
x IN (<list>)
OR y BETWEEN
a
AND b
...
* Queries against tables/columns of varying data types. (BOOLEAN,
SMALLINT, TEXT, VARCHAR, etc.)
* Queries against tables with/out constraints
* Queries against tables with/out triggers {post,pre}-{non,}SELECT
* Transactions
* Individual and common functions (common use, not necessarily common
name, e.g. SUBSTRING/SUBSTR, MAX, COUNT, ORDER BY w/{,o} LIMIT).
* Performance under load (e.g. 1, 10, 100 concurrent users),
- need to delineate how DB's handle concurrent queries against the
same tuples AND against different tuples/tables.
* Access method (e.g. Thru C libs, via PHP/Postgres libs, apache/web,
command line and stdin scripts)
# I don't currently have access to a RAID setup, so this will all have
to be on single hard drive for now. Perhaps later I can procure more
hardware/situations with which to test.
Clearly, this is only a small portion of what I should be aware when I'm
benchmarking different DB's in terms of speed/performance, and already
it's feeling daunting. Feel free to add any/all items about which I'm
not thinking.
The other thing: as I'm still a bit of a noob, my use of the Postgres DB
has been almost entirely with the stock configuration. Since I'm
planning to run these tests on the same hardware, I can pseudo-rule out
hardware-related differences in the results. However, I'm hoping that I
can give my stats/assumptions to the list and someone would give me a
configuration file that would /most likely/ be best? I can search the
documentation/archives, but I'm hoping to get a head start and tweak
from there.
Any and all advice would be /much/ appreciated!
Kevin
Stats for the (first) machines on which I'll be running my tests:
Dell Inspiron Laptop Dell Workstation
1 x Pentium M @ 1.5Ghz 1 x Pentium 4 @ 2.0Ghz
512MB RAM 512MB RAM
30GB IDE HD 80GB IDE HD
Ubuntu Edgy OS dual Redhat 9.0 (Shrike)+
WinXP SP2
I think that's all that's pertinent for now.