On 11/03/2010 04:52 PM, Nick Matheson wrote:
We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below, it takes 1 GB to store 350 MB of nominal data. However that suggests we would expect to get 35 MB/s bulk read rates.
Our business logic does operations on the resulting data such that the output is several orders of magnitude smaller than the input. So we had hoped that by putting our business logic into stored procedures (and thus drastically reducing the amount of data flowing to the client) our throughput would go way up. This did not happen.
Can you disclose what kinds of manipulations you want to do on the data? I am asking because maybe there is a fancy query (possibly using windowing functions and / or aggregation functions) that gets you the speed that you need without transferring the whole data set to the client.
So our questions are as follows: Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to the expected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it to perform quite well. However it does not measure up to Postgres in terms of replication, data interrogation, community support, acceptance, etc).
Kind regards robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance