Hi Kevin,
I did another try with following additional changes based on our discussion:
1. use the tcp connection
2. turn off autovacuum
3. turn off full_page_writes
I could import more than 30G data in about 2 hours. That's totally
acceptable performance to me with the current server capability. There
is a minor issue though. I saw a few errors during the import:
ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27
ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427
My data was exported from an UTF8 MySQL database and my pgsql db is also
UTF8. I got 8 errors above only with about 3 million records imported.
The strange thing is, I usually see the problematic SQL output in the
log if there is any error for that SQL so I have a chance to fix the
data manually. But for the errors above, I don't see any SQL logged. The
pgsql log just output error log same as above with no additional info:
2011-09-01 11:26:32 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe6272c
2011-09-01 11:26:47 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:53 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe68e27
2011-09-01 11:27:01 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe7272c
2011-09-01 11:27:06 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:27:15 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5a427
What could be the cause of that?
于 2011/8/30 9:29, Rural Hunter 写道:
Thank you. I didn't understand what 'vacuum freeze' actually does. I
will check the detail to see if it's good for my situation. and I will
also test the load by tcp connection. Thanks again for all your
advices and they are really very helpful to me!
于 2011/8/30 0:06, Kevin Grittner 写道:
Rural Hunter<ruralhunter@xxxxxxxxx> wrote:
2011/8/29 23:18, Kevin Grittner:
I also recommend a VACUUM FREEZE ANALYZE on the database unless
most of these rows will be deleted or updated before you run a
billion database transactions. Otherwise you will get a painful
"anti-wraparound" autovacuum on everything, probably at a time
of heavy usage.
hmm....I will try to turn autovacuum off though I didn't see any
resource intension caused by it.
Well, turning off autovacuum during a bulk load is probably a net
gain if it's insert-only (i.e., no need to query just-loaded data to
decide what to do with new rows); but that's not what I was getting
at. Bulk loading 200 GB of data which is not going to be deleted or
updated heavily is setting a performance time bomb without a VACUUM
FREEZE. At some point, perhaps months later, it will be necessary
to freeze the tuples to prevent data loss, and since this occurs
based on a threshold of how many transaction IDs have been consumed,
it is most likely to happen at peak OLTP loads, when it will be the
biggest problem. A VACUUM FREEZE (and you might as well throw in
ANALYZE while you're at it) will take care of that up front. As a
side benefit it will keep SELECT statements from generating heavy
*write* loads on the first access to tuples, and will perform other
maintenance which will improve database performance.
I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
load time required before letting in users.
Network latency?
No, I do the import locally on the db server so the network
letency can be excluded.
Hmm... I don't remember the details, but there was a problem at
some point where Linux pipe connections could introduce significant
latency, and you could get much better performance on a TCP
connection through localhost. It might be worth a try. (Maybe
someone else will remember the details.)
-Kevin
--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin