I have a program that inserts 50M records of about 30 bytes each, with
some simple indexing, using about 5 GB of disk, layout shown below. When
I run the program without the inserts, it takes a few seconds to do just
the calculation part.
With inserts, it takes about 90 minutes to run on my macbook pro (2012)
with a spinning disk and 8G memory. Since CPU was running at 40% idle, I
figured this must be due to waiting on disk, so I swapped in an SSD. Now
on the console I see 6 Mb/s negotiated link speed on disk, vs. 3 Mb/s
before.
Surprise: raw insert speed is slower at first. CPU idle remains the
same. Comparing the outer loop of my insert, at i=2160 of 8400:
prev: 29 minutes
now: 31 minutes
But by the end, the SSD has pulled ahead:
prev: 88 minutes
now: 66 minutes
And in the next phase, which is all queries, it goes much faster than
spinning disk, for total real 72m15.728s.
Now I suspect the limit is OSX throttling per-process CPU.
Does this sound right?
Thanks,
Bill
Here is my postgres config:
shared_buffers = 2048MB
temp_buffers = 32MB
work_mem = 8MB
checkpoint_segments = 32
--- 41480732 of these records
CREATE TABLE IF NOT EXISTS pic_pic_color
(
id1 VARCHAR(10),
id2 VARCHAR(10),
cd REAL
);
CREATE INDEX color_id1_idx ON pic_pic_color (id1);
CREATE INDEX color_id2_idx ON pic_pic_color (id2);
CREATE INDEX color_e_idx ON pic_pic_color (cd);
--- 7929126 of these records
CREATE TABLE IF NOT EXISTS pic_pic_kwd
(
coder SMALLINT,
id1 VARCHAR(10),
id2 VARCHAR(10),
closeness INTEGER
);
CREATE INDEX kwd_seq1_idx ON pic_pic_kwd (coder, id1);
CREATE INDEX kwd_close_idx ON pic_pic_kwd (closeness DESC);
(Curious about the application? http://phobrain.com)
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general