A quick test:
-----
1. create table x1(x int, y bytea);
2. Load some data say with python:
cp /opt/java/src.zip ~/tmp/a.dat (19MB)
##
import psycopg2
conn = psycopg2.connect("dbname='test' user='*****' password='****' host='127.0.0.1'");
conn.cursor().execute("INSERT INTO x1 VALUES (1, %s)", (psycopg2.Binary(open("a.dat").read()),))
conn.commit()
##
3. create table x2(x int, y bytea);
4. Copy table x1 100 times to x2 (1.9GB) and monitor/measure IO:
insert into x2 select a x, y from generate_series(1,100) a, x1;
Results:
-----------
On Linux 2.6.32 with an ext3 file system on one 15K rpm disk, we saw with SystemTap that the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO and took 61 seconds (52 CPU + 9 sleep/wait for IO)
Deletion (delete from x2) took 32 seconds with 12 seconds CPU and 20 sec sleep + wait for IO. Actual disk IO was about 4GB.
Since Pg does not use the concept of rollback segments, it is unclear why deletion produces so much disk IO (4GB).
VJ
-----
1. create table x1(x int, y bytea);
2. Load some data say with python:
cp /opt/java/src.zip ~/tmp/a.dat (19MB)
##
import psycopg2
conn = psycopg2.connect("dbname='test' user='*****' password='****' host='127.0.0.1'");
conn.cursor().execute("INSERT INTO x1 VALUES (1, %s)", (psycopg2.Binary(open("a.dat").read()),))
conn.commit()
##
3. create table x2(x int, y bytea);
4. Copy table x1 100 times to x2 (1.9GB) and monitor/measure IO:
insert into x2 select a x, y from generate_series(1,100) a, x1;
Results:
-----------
On Linux 2.6.32 with an ext3 file system on one 15K rpm disk, we saw with SystemTap that the source 1.9GB (19MB x 100) resulted in 5GB of actual disk IO and took 61 seconds (52 CPU + 9 sleep/wait for IO)
Deletion (delete from x2) took 32 seconds with 12 seconds CPU and 20 sec sleep + wait for IO. Actual disk IO was about 4GB.
Since Pg does not use the concept of rollback segments, it is unclear why deletion produces so much disk IO (4GB).
VJ
Hi all,
my posting on 2010-01-14 about the performance when writing
bytea to disk caused a longer discussion. While the fact
still holds that the overall postgresql write performance is
roughly 25% of the serial I/O disk performance this was
compensated for my special use case here by doing some other
non-postgresql related things in parallel.
Now I cannot optimize my processes any further, however, now
I am facing another quite unexpected performance issue:
Deleting rows from my simple table (with the bytea column)
having 16 MB data each, takes roughly as long as writing
them!
Little more detail:
* The table just has 5 unused int columns, a timestamp,
OIDs, and the bytea column, no indices; the bytea storage
type is 'extended', the 16 MB are compressed to approx. the
half.
* All the usual optimizations are done to reach better
write through (pg_xlog on another disk, much tweaks to the
server conf etc), however, this does not matter here, since
not the absolute performance is of interest here but the
fact that deleting roughly takes 100% of the writing time.
* I need to write 15 rows of 16 MB each to disk in a maximum
time of 15 s, which is performed here in roughly 10 seconds,
however, now I am facing the problem that keeping my
database tidy (deleting rows) takes another 5-15 s (10s on
average), so my process exceeds the maximum time of 15s for
about 5s.
* Right now I am deleting like this:
DELETE FROM table WHERE (CURRENT_TIMESTAMP -
my_timestamp_column) > interval '2 minutes';
while it is planned to have the interval set to 6 hours in
the final version (thus creating a FIFO buffer for the
latest 6 hours of inserted data; so the FIFO will keep
approx. 10.000 rows spanning 160-200 GB data).
* This deletion SQL command was simply repeatedly executed
by pgAdmin while my app kept adding the 16 MB rows.
* Autovacuum is on; I believe I need to keep it on,
otherwise I do not free the disk space, right? If I switch
it off, the deletion time reduces from the average 10s down
to 4s.
* I am using server + libpq version 8.2.4, currently on
WinXP. Will an upgrade to 8.4 help here?
Do you have any other ideas to help me out?
Oh, please...
Thank You
Felix
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance