Re: Atomic non-durable file write API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24.12.2010 10:51, Ted Ts'o wrote:
On Fri, Dec 24, 2010 at 02:00:13AM +0100, Christian Stroetmann wrote:
I really do know what you want to say, despite that this example is
based on a bug in another system than the FS. But there will be
other examples, for sure.
Sure, but this thread started because someone wanted an "atomic
non-durable file write API", apparently because it was too slow to use
fsync().  If people use databases, it's not a problem; databases use
fsync(), but they use it properly and they provide the proper
transactional interfaces that people want.

That's why I agreed with you on this technical operating system level and would like to give the additional information that the database management system (DBMS) handles this in the interplay with the FS and that a database is stored in a file often with a propritary format for efficiency.

The problem comes when people try to implement their own databases
using small files for each row and column of the database, or for each
registry variable.  Then they complain when fsync() is to expensive,
because they need to use fsync() for every single 3 bytes of data they
store in their badly implemented database.

Yes, agreed (see above).

The bottom line is that if you want atomic updates of state
information, you need to use fsync() or fdatasync().  If this is a
performance bottleneck, then you're doing something wrong.  Maybe you
shouldn't be writing a third of a megabyte on every URL click, on the
main GUI thread; maybe the user doesn't need to remember every single
URL that was visited even if the power suddenly fails (maybe it's
enough if you write that information to disk every 3-5 minutes, and
less if you're running on battery).  Or maybe you shouldn't be using
hundreds of small state files, and screw up the dirty flag handling.
But regardless, you're doing something wrong/stupid.

Here we are on the application level. And here it starts where I say that to use an FS as a DBMS is not the true problem.

Potentially off-topic:
And while we are at this point, from my point of view the wrong/stupid acting is how an FS is used from the operating system level. That's because, as said above, a database is stored in a file and the only functionality that is missing in an FS managemant system is exactly that what in this case is added by the DBMS. If you programm in a clever way it must be faster than the standard concept, which is a file that represents a database is stored in an FS, because some FS functions don't really have to be called. And to do such a special FS handling seen from the kernel level is not uncommon, because backup systems do it already and an FS that you don't like does it as well, and already the A of ACID. The rest can be handled by an appropriated FS plug-in system. So we come back to the point again where this functionality has to be, in the FS or the VFS. You say VFS, I say FS, like R4, and OntoFS #1 (R4- and ontology-based) and #2 (ext2/3-, sqlite- and ontology-based conversion from fuse-sqlite).

						- Ted

Have fun
Christian Stroetmann
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux