Commercial Db bulk loaders work the same way.. they give you an option
as a fast loader provided in case of error, the whole table is
truncated. This I think also has real life advantages where PostgreSQL
is used as datamarts which are recreated every now and then from other
systems and they want fast loaders. So its not just the benchmarking
folks like me that will take advantage of such features. INFACT I have
seen that they force the clause "REPLACE TABLE" in the sense that will
infact truncate the table before loading so there is no confusion what
happens to the original data in the table and only then it avoids the logs.
to be honest, its not the WAL Writes to the disk that I am worried
about.. According to my tests, async_commit is coming pretty close to
sync=off and solves the WALWriteLock contention. We should maybe just
focus on making it more efficient which I think also involves
WALInsertLock that may not be entirely efficient.
Also all changes have to be addon options and not replacement for
existing loads, I totally agree to that point.. The guys in production
support don't even like optimizer query plan changes, forget corrupt
index. (I have spent two days in previous role trying to figure out why
a particular query plan on another database changed in production.)
Simon Riggs wrote:
On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote:
That sounds cool to me too..
How much work is to make pg_bulkload to work on 8.3? An Integrated
version is certainly more beneficial.
Specially I think it will also help for other setups like TPC-E too
where this is a problem.
If you don't write WAL then you can lose all your writes in a crash.
That issue is surmountable on a table with no indexes, or even
conceivably with one monotonically ascending index. With other indexes
if we crash then we have a likely corrupt index.
For most production systems I'm aware of, losing an index on a huge
table is not anything you'd want to trade for performance. Assuming
you've ever been knee-deep in it on a real server.
Maybe we can have a "load mode" for a table where we skip writing any
WAL, but if we crash we just truncate the whole table to nothing? Issue
a WARNING if we enable this mode while any data in table. I'm nervous of
it, but maybe people really want it?
I don't really want to invent ext2 all over again, so we have to run an
fsck on a table of we crash while loading. My concern is that many
people would choose that then blame us for delivering unreliable
software. e.g. direct path loader on Oracle used to corrupt a PK index
if you loaded duplicate rows with it (whether it still does I couldn't
care). That kind of behaviour is simply incompatible with production
usage, even if it does good benchmark.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings