Re: plsql gets "out of memory"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kevin,

I did another try with following additional changes based on our discussion:
1. use the tcp connection
2. turn off autovacuum
3. turn off full_page_writes

I could import more than 30G data in about 2 hours. That's totally acceptable performance to me with the current server capability. There is a minor issue though. I saw a few errors during the import:
ERROR:  invalid byte sequence for encoding "UTF8": 0xe6272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe68e27
ERROR:  invalid byte sequence for encoding "UTF8": 0xe7272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5a427

My data was exported from an UTF8 MySQL database and my pgsql db is also UTF8. I got 8 errors above only with about 3 million records imported. The strange thing is, I usually see the problematic SQL output in the log if there is any error for that SQL so I have a chance to fix the data manually. But for the errors above, I don't see any SQL logged. The pgsql log just output error log same as above with no additional info: 2011-09-01 11:26:32 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c 2011-09-01 11:26:47 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:53 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27 2011-09-01 11:27:01 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c 2011-09-01 11:27:06 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:27:15 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427

What could be the cause of that?


于 2011/8/30 9:29, Rural Hunter 写道:
Thank you. I didn't understand what 'vacuum freeze' actually does. I will check the detail to see if it's good for my situation. and I will also test the load by tcp connection. Thanks again for all your advices and they are really very helpful to me!

于 2011/8/30 0:06, Kevin Grittner 写道:
Rural Hunter<ruralhunter@xxxxxxxxx>  wrote:
2011/8/29 23:18, Kevin Grittner:

I also recommend a VACUUM FREEZE ANALYZE on the database unless
most of these rows will be deleted or updated before you run a
billion database transactions. Otherwise you will get a painful
"anti-wraparound" autovacuum on everything, probably at a time
of heavy usage.
hmm....I will try to turn autovacuum off though I didn't see any
resource intension caused by it.

Well, turning off autovacuum during  a bulk load is probably a net
gain if it's insert-only (i.e., no need to query just-loaded data to
decide what to do with new rows); but that's not what I was getting
at.  Bulk loading 200 GB of data which is not going to be deleted or
updated heavily is setting a performance time bomb without a VACUUM
FREEZE.  At some point, perhaps months later, it will be necessary
to freeze the tuples to prevent data loss, and since this occurs
based on a threshold of how many transaction IDs have been consumed,
it is most likely to happen at peak OLTP loads, when it will be the
biggest problem.  A VACUUM FREEZE (and you might as well throw in
ANALYZE while you're at it) will take care of that up front.  As a
side benefit it will keep SELECT statements from generating heavy
*write* loads on the first access to tuples, and will perform other
maintenance which will improve database performance.

I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
load time required before letting in users.

Network latency?

No, I do the import locally on the db server so the network
letency can be excluded.

Hmm...  I don't remember the details, but there was a problem at
some point where Linux pipe connections could introduce significant
latency, and you could get much better performance on a TCP
connection through localhost.  It might be worth a try.  (Maybe
someone else will remember the details.)

-Kevin




--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux