If you don't need that level of consistency for your 8MB blobs, write them to plain files named with some kind of id, and put the id in the database instead of the blob. This will reduce the amount of disk I/O for storing each blob by nearly 50%, and will reduce marshaling overheads by a larger magin.
From your account, it sounds like the database is performing nicely on that hardware ... 16MB/sec to a raw disk or filesystem is rather slow by modern standards, but 16MB/sec of database updates is pretty good for having everything on one slow-ish spindle.
On Fri, Jan 15, 2010 at 3:15 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
postgres is simply not geared towards this type of workload. 16mbOn Thu, Jan 14, 2010 at 9:29 AM, fkater@xxxxxxxxxxxxxx
<fkater@xxxxxxxxxxxxxx> wrote:
> Hello together,
>
> I need to increase the write performance when inserting
> bytea of 8MB. I am using 8.2.4 on windows with libpq.
>
> The test setting is simple:
>
> I write 100x times a byte array (bytea) of 8 MB random data
> into a table having a binary column (and oids and 3 other
> int columns, oids are indexed). I realized that writing 8 MB
> of 0-bytes is optimized away. With random data, the disk
> space now is filled with 800MB each run as expected. I use a
> transaction around the insert command.
>
> This takes about 50s, so, 800MB/50s = 16MB/s.
>
> However the harddisk (sata) could write 43 MB/s in the worst
> case! Why is write performance limited to 16 MB/s?
>
>
> Some more hints what I do:
>
> I use PQexecParams() and the INSERT ... $001 notation to NOT
> create a real escapted string from the data additionally but
> use a pointer to the 8MB data buffer.
>
> I altered the binary column to STORAGE EXTERNAL.
>
> Some experiments with postgresql.conf (fsync off,
> shared_buffers=1000MB, checkpoint_segments=256) did not
> change the 50s- much (somtimes 60s sometimes a little less).
>
> 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
>
>
> Do you have any further idea why 16MB/s seems to be the
> limit here?
isn't too bad actually, and I bet you could significantly beat that
with better disks and multiple clients sending data, maybe even close
to saturate a gigabit line. However, there are other ways to do this
(outside the db) that are more appropriate if efficiency is a big
concern.
merlin
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance