Re: Benchmark Data requested

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Commercial Db bulk loaders work the same way.. they give you an option as a fast loader provided in case of error, the whole table is truncated. This I think also has real life advantages where PostgreSQL is used as datamarts which are recreated every now and then from other systems and they want fast loaders. So its not just the benchmarking folks like me that will take advantage of such features. INFACT I have seen that they force the clause "REPLACE TABLE" in the sense that will infact truncate the table before loading so there is no confusion what happens to the original data in the table and only then it avoids the logs.


to be honest, its not the WAL Writes to the disk that I am worried about.. According to my tests, async_commit is coming pretty close to sync=off and solves the WALWriteLock contention. We should maybe just focus on making it more efficient which I think also involves WALInsertLock that may not be entirely efficient.


Also all changes have to be addon options and not replacement for existing loads, I totally agree to that point.. The guys in production support don't even like optimizer query plan changes, forget corrupt index. (I have spent two days in previous role trying to figure out why a particular query plan on another database changed in production.)






Simon Riggs wrote:
On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote:
That sounds cool to me too..

How much work is to make pg_bulkload to work on 8.3? An Integrated version is certainly more beneficial.

Specially I think it will also help for other setups like TPC-E too where this is a problem.
If you don't write WAL then you can lose all your writes in a crash.
That issue is surmountable on a table with no indexes, or even
conceivably with one monotonically ascending index. With other indexes
if we crash then we have a likely corrupt index.

For most production systems I'm aware of, losing an index on a huge
table is not anything you'd want to trade for performance. Assuming
you've ever been knee-deep in it on a real server.

Maybe we can have a "load mode" for a table where we skip writing any
WAL, but if we crash we just truncate the whole table to nothing? Issue
a WARNING if we enable this mode while any data in table. I'm nervous of
it, but maybe people really want it?

I don't really want to invent ext2 all over again, so we have to run an
fsck on a table of we crash while loading. My concern is that many
people would choose that then blame us for delivering unreliable
software. e.g. direct path loader on Oracle used to corrupt a PK index
if you loaded duplicate rows with it (whether it still does I couldn't
care). That kind of behaviour is simply incompatible with production
usage, even if it does good benchmark.


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux