Re: Simple (hopefully) throughput question?

Robert Klemme <shortcutter@xxxxxxxxxxxxxx> · Fri, 05 Nov 2010 18:30:54 +0100

On 11/03/2010 04:52 PM, Nick Matheson wrote:

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table). We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
due to the storage overhead we have observed in Postgres. In the
example below, it takes 1 GB to store 350 MB of nominal data. However
that suggests we would expect to get 35 MB/s bulk read rates.

Our business logic does operations on the resulting data such that
the output is several orders of magnitude smaller than the input. So
we had hoped that by putting our business logic into stored procedures
(and thus drastically reducing the amount of data flowing to the
client) our throughput would go way up. This did not happen.

Can you disclose what kinds of manipulations you want to do on the data? 
 I am asking because maybe there is a fancy query (possibly using 
windowing functions and / or aggregation functions) that gets you the 
speed that you need without transferring the whole data set to the client.

So our questions are as follows:

Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads? Or is this the price we have to pay for
using SQL instead of some NoSQL solution. (We actually tried Tokyo
Cabinet and found it to perform quite well. However it does not measure
up to Postgres in terms of replication, data interrogation, community
support, acceptance, etc).

Kind regards

	robert

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance