Re: large dataset with write vs read clients

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Sun, 10 Oct 2010 14:43:12 +0800

On 10/10/2010 5:35 AM, Mladen Gogala wrote:
I have a logical problem with asynchronous commit. The "commit" command
should instruct the database to make the outcome of the transaction
permanent. The application should wait to see whether the commit was
successful or not. Asynchronous behavior in the commit statement breaks
the ACID rules and should not be used in a RDBMS system. If you don't
need ACID, you may not need RDBMS at all. You may try with MongoDB.
MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs

That argument makes little sense to me.

Because you can afford a clearly defined and bounded loosening of the 
durability guarantee provided by the database, such that you know and 
accept the possible loss of x seconds of work if your OS crashes or your 
UPS fails, this means you don't really need durability guarantees at all 
- let alone all that atomic commit silliness, transaction isolation, or 
the guarantee of a consistent on-disk state?

Some of the other flavours of non-SQL databases, both those that've been 
around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those 
that're new and fashionable Cassandra, CouchDB, etc, provide some ACID 
properties anyway. If you don't need/want an SQL interface to your 
database you don't have to throw out all that other database-y goodness 
if you haven't been drinking too much of the NoSQL kool-aid.

There *are* situations in which it's necessary to switch to relying on 
distributed, eventually-consistent databases with non-traditional 
approaches to data management. It's awfully nice not to have to, though, 
and can force you to do a lot more wheel reinvention when it comes to 
querying, analysing and reporting on your data.

FWIW, a common approach in this sort of situation has historically been 
- accepting that RDBMSs aren't great at continuous fast loading of 
individual records - to log the records in batches to a flat file, 
Berkeley DB, etc as a staging point. You periodically rotate that file 
out and bulk-load its contents into the RDBMS for analysis and 
reporting. This doesn't have to be every hour - every minute is usually 
pretty reasonable, and still gives your database a much easier time 
without forcing you to modify your app to batch inserts into 
transactions or anything like that.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance