I have looked at the I/O drive, and it could increase our DB throughput significantly over a RAID array.
Ideally, I would put a few key tables and the WAL, etc. I'd also want all the sort or hash overflow from work_mem to go to this device. Some of our tables / indexes are heavily written to for short periods of time then more infrequently later -- these are partitioned by date. I would put the fresh ones on such a device then move them to the hard drives later.
Ideally, we would then need a few changes in Postgres to take full advantage of this:
#1 Per-Tablespace optimizer tuning parameters. Arguably, this is already needed. The tablespaces on such a solid state device would have random and sequential access at equal (low) cost. Any one-size-fits-all set of optimizer variables is bound to cause performance issues when two tablespaces have dramatically different performance profiles.
#2 Optimally, work_mem could be shrunk, and the optimizer would have to not preferentially sort - group_aggregate whenever it suspected that work_mem was too large for a hash_agg. A disk based hash_agg will pretty much win every time with such a device over a sort (in memory or not) once the number of rows to aggregate goes above a moderate threshold of a couple hundred thousand or so.
In fact, I have several examples with 8.3.3 and a standard RAID array where a hash_agg that spilled to disk (poor or -- purposely distorted statistics cause this) was a lot faster than the sort that the optimizer wants to do instead. Whatever mechanism is calculating the cost of doing sorts or hashes on disk will need to be tunable per tablespace.
I suppose both of the above may be one task -- I don't know enough about the Postgres internals.
#3 Being able to move tables / indexes from one tablespace to another as efficiently as possible.
There are probably other enhancements that will help such a setup. These were the first that came to mind.
On Tue, Jul 8, 2008 at 2:49 AM, Markus Wanner <markus@xxxxxxxxxx> wrote:
Hi,I'm not sure how they work either, but why should they require more CPU cycles than any other PCIe SAS controller?
Jonah H. Harris wrote:
I'm not sure how those cards work, but my guess is that the CPU will
go 100% busy (with a near-zero I/O wait) on any sizable workload. In
this case, the current pgbench configuration being used is quite small
and probably won't resemble this.
I think they are doing a clever step by directly attaching the NAND chips to PCIe, instead of piping all the data through SAS or (S)ATA (and then through PCIe as well). And if the controller chip on the card isn't absolutely bogus, that certainly has the potential to reduce latency and improve throughput - compared to other SSDs.
Or am I missing something?
Regards
Markus
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance