Re: two disks - best way to use them?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ron wrote:

For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used? Did you test the three options or just pick one?

(Note: I'm not the original poster.)

I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd set of spindles. That was the easiest thing for me to change on this test box.

The test server is simply a Gentoo box running software RAID and LVM2. The primary disk set is 2x7200RPM 300GB drives and the secondary disk set is 2x5400RPM 300GB drives. Brand new install of PGSQL 8.1, with mostly default settings (I changed FSM pages to be a higher value, max_fsm_pages = 150000). PGSQL was given it's own ext3 32GB LVM volume on the primary disk set (2x7200RPM). Originally, all files were on the primary disk.

The task at hand was inserting large quantity of ~45 byte rows (according to "vacuum verbose"), on the order of millions of records per table. There was an unique key and a unique index. Test clients were accessing the database via ODBC / ADO and doing the inserts in a fairly brute-force mode (attempt the insert, calling .CancelUpdate if it fails).

When the tables were under 2 million rows, performance was okay. At one point, I had a 1.8Ghz P4, dual Opteron 246, and Opteron 148 CPUs running at nearly 100% CPU processing and doing inserts into the database. So I had 4 clients running, performing inserts to 4 separate tables in the same database. The P4 ran at about half the throughput as the Opterons (client-bound due to the code that generated row data prior to the insert), so I'd score my throughput as roughly 3.3-3.4. Where 1.0 would be full utilization of the Opteron 148 box.

However, once the tables started getting above ~2 million rows, performance took a nose dive. CPU utilizations on the 4 client CPUs dropped into the basement (5-20% CPU) and I had to back off on the number of clients. So throughput had dropped down to around 0.25 or so. The linux box was spending nearly all of its time waiting on the primary disks.

Moving the data/pg_xlog (WAL) to the 2nd set of disks (2x5400RPM) in the test server made a dramatic difference for this mass insert. I'm running the P4 (100% CPU) and the Opteron 148 (~80% CPU) at the moment. While it's not up to full speed, a throughput of ~1.3 is a lot better then the ~0.25 that I was getting prior. (The two tables currently being written have over 5 million rows each. One table has ~16 million rows.) Wait percentage in "top" is only running 20-30% (dipping as low as 10%). I haven't pushed this new setup hard enough to determine where the upper limit for throughput is.

It's very much a niche test (millions of inserts of narrow rows into multiple tables using fairly brain-dead code). But it gives me data points on which to base purchasing of the production box. The original plan was a simple RAID1 setup (2 spindles), but this tells me it's better to order 4 spindles and set it up as a pair of RAID1 sets.

Whether 4 spindles is better as two separate RAID1 arrays, or configured as a single RAID1+0 array... dunno. Our application is typically more limited by insert speed then read speed (so I'm leaning towards separate RAID arrays).

I'm sure there's also more tuning that could be done to the PGSQL database (in the configuration file). Also, the code is throwaway code that isn't the most elegant.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux