On 18 Listopad 2011, 10:55, CSS wrote: > Hello, > > I'm going to be testing some new hardware (see > http://archives.postgresql.org/pgsql-performance/2011-11/msg00230.php) and > while I've done some very rudimentary before/after tests with pgbench, I'm > looking to pull more info than I have in the past, and I'd really like to > automate things further. > > I'll be starting with basic disk benchmarks (bonnie++ and iozone) and then > moving on to pgbench. > > I'm running FreeBSD and I'm interested in getting some baseline info on > UFS2 single disk (SATA 7200/WD RE4), gmirror, zfs mirror, zfs raidz1, zfs > set of two mirrors (ie: two mirrored vdevs in a mirror). Then I'm > repeating that with the 4 Intel 320 SSDs, and just to satisfy my > curiosity, a zfs mirror with two of the SSDs mirrored as the ZIL. > > Once that's narrowed down to a few practical choices, I'm moving on to > pgbench. I've found some good info here regarding pgbench that is > unfortunately a bit dated: > http://www.westnet.com/~gsmith/content/postgresql/ > > A few questions: > > -Any favorite automation or graphing tools beyond what's on Greg's site? There are talks not listed on that westnet page - for example a recent "Bottom-up Database Benchmarking" talk, available for example here: http://pgbr.postgresql.org.br/2011/palestras.php?id=60 It probably contains more recent info about benchmarking tools and testing new hardware. > -Any detailed information on creating "custom" pgbench tests? The technical info at http://www.postgresql.org/docs/9.1/interactive/pgbench.html should be sufficient I guess, it's fairly simple. The most difficult thing is determining what the script should do - what queries to execute etc. And that depends on the application. > -Any other postgres benchmarking tools? Not really. The pgbench is a nice stress testing tool and the scripting is quite flexible. I've done some TPC-H-like testing recently, but it's rather a bunch of scripts executed manually. > I'm also curious about benchmarking using my own data. I tried something > long ago that at least gave the illusion of working, but didn't seem quite > right to me. I enabled basic query logging on one of our busier servers, > dumped the db, and let it run for 24 hours. That gave me the normal > random data from users throughout the day as well as our batch jobs that > run overnight. I had to grep out and reformat the actual queries from the > logfile, but that was not difficult. I then loaded the dump into the > test server and basically fed the saved queries into it and timed the > result. I also hacked together a script to sample cpu and disk stats > every 2S and had that feeding into an rrd database so I could see how > "busy" things were. > > In theory, this sounded good (to me), but I'm not sure I trust the > results. Any suggestions on the general concept? Is it sound? Is there > a better way to do it? I really like the idea of using (our) real data. It's definitely a step in the right direction. An application-specific benchmark is usually much more useful that a generic stress test. It simply is going to tell you more about your workload and you can use it to asses the capacity more precisely. There are some issues though - mostly about transactions and locking. For example if the client starts a transaction, locks a bunch of records and then performs a time-consuming processing task outside the database, the other clients may be locked. You won't see this during the stress test, because in reality it looks like this 1) A: BEGIN 2) A: LOCK (table, row, ...) 3) A: perform something expensive 4) B: attempt to LOCK the same resource (blocks) 5) A: release the LOCK 6) B: obtains the LOCK and continues but when replaying the workload, you'll see this 1) A: BEGIN 2) A: LOCK (table, row, ...) 3) B: attempt to LOCK the same resource (blocks) 4) A: release the LOCK 5) B: obtains the LOCK and continues so B waits for a very short period of time (or not at all). To identify this problem, you'd have to actually behave like the client. For example with a web application, you could use apache bench (https://httpd.apache.org/docs/2.0/programs/ab.html) or something like that. > Lastly, any general suggestions on tools to collect system data during > tests and graph it are more than welcome. I can homebrew, but I'm sure > I'd be reinventing the wheel. System stats or database stats? There's a plenty of tools for system stats (e.g. sar). For database stat's it's a bit more difficult - there's pgwatch, pgstatspack and maybe some other tools (I've written pg_monitor). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance