On 07/17/2012 01:56 AM, Jon Nelson wrote:
What is the greater lesson to take away, here? If you are working with
data that is larger (substantially larger) than available memory, is
the architecture and design of postgresql such that the only real
approach is some type of data partitioning? It is not my intent to
insult or even disparage my favorite software, but it took less time
to *build* the indices for 550GB of data than it would have to insert
1/20th as much. That doesn't seem right.
To perform reasonably well, Pg would need to be able to defer index
updates when bulk-loading data in a single statement (or even
transaction), then apply them when the statement finished or transaction
committed. Doing this at a transaction level would mean you'd need a way
to mark indexes as 'lazily updated' and have Pg avoid using them once
they'd been dirtied within a transaction. No such support currently
exists, and it'd be non-trivial to implement, especially since people
loading huge amounts of data often want to do it with multiple
concurrent sessions. You'd need some kind of 'DISABLE INDEX' and 'ENABLE
INDEX' commands plus a transactional backing table of pending index updates.
Not simple.
Right now, Pg is trying to keep the index consistent the whole time.
That involves moving a heck of a lot of data around - repeatedly.
Setting a lower FILLFACTOR on your indexes can give Pg some breathing
room here, but only a limited amount, and at the cost of reduced scan
efficiency.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance