On 24.12.2010 10:51, Ted Ts'o wrote:
On Fri, Dec 24, 2010 at 02:00:13AM +0100, Christian Stroetmann wrote:
I really do know what you want to say, despite that this example is
based on a bug in another system than the FS. But there will be
other examples, for sure.
Sure, but this thread started because someone wanted an "atomic
non-durable file write API", apparently because it was too slow to use
fsync(). If people use databases, it's not a problem; databases use
fsync(), but they use it properly and they provide the proper
transactional interfaces that people want.
That's why I agreed with you on this technical operating system level
and would like to give the additional information that the database
management system (DBMS) handles this in the interplay with the FS and
that a database is stored in a file often with a propritary format for
efficiency.
The problem comes when people try to implement their own databases
using small files for each row and column of the database, or for each
registry variable. Then they complain when fsync() is to expensive,
because they need to use fsync() for every single 3 bytes of data they
store in their badly implemented database.
Yes, agreed (see above).
The bottom line is that if you want atomic updates of state
information, you need to use fsync() or fdatasync(). If this is a
performance bottleneck, then you're doing something wrong. Maybe you
shouldn't be writing a third of a megabyte on every URL click, on the
main GUI thread; maybe the user doesn't need to remember every single
URL that was visited even if the power suddenly fails (maybe it's
enough if you write that information to disk every 3-5 minutes, and
less if you're running on battery). Or maybe you shouldn't be using
hundreds of small state files, and screw up the dirty flag handling.
But regardless, you're doing something wrong/stupid.
Here we are on the application level. And here it starts where I say
that to use an FS as a DBMS is not the true problem.
Potentially off-topic:
And while we are at this point, from my point of view the wrong/stupid
acting is how an FS is used from the operating system level. That's
because, as said above, a database is stored in a file and the only
functionality that is missing in an FS managemant system is exactly that
what in this case is added by the DBMS. If you programm in a clever way
it must be faster than the standard concept, which is a file that
represents a database is stored in an FS, because some FS functions
don't really have to be called.
And to do such a special FS handling seen from the kernel level is not
uncommon, because backup systems do it already and an FS that you don't
like does it as well, and already the A of ACID. The rest can be handled
by an appropriated FS plug-in system. So we come back to the point again
where this functionality has to be, in the FS or the VFS. You say VFS, I
say FS, like R4, and OntoFS #1 (R4- and ontology-based) and #2 (ext2/3-,
sqlite- and ontology-based conversion from fuse-sqlite).
- Ted
Have fun
Christian Stroetmann
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html