On Sun, Dec 26, 2010 at 11:10 PM, Ted Ts'o <tytso@xxxxxxx> wrote: > On Sun, Dec 26, 2010 at 07:51:23PM +0100, Olaf van der Spek wrote: >> f = open(..., O_ATOMIC, O_CREAT, O_TRUNC); > > Great, let's rename O_ATOMIC to O_PONIES. Â:-) If that makes you happy. >> abort/rollback(...); // optional > > As I said earlier, "file systems are not databases", and "databases > are not file systems". ÂOracle tried to foist their database as a file > system during the dot.com boom, and everyone laughed at them; the > performance was a nightmare. ÂIf Oracle wasn't able to make a > transaction engine that supports transactions and rollbacks > performant, you really expect that you'll be able to do it? Like I've said dozens of times, this is not about full DB functionality. Why do you keep making false analogies? >> > If it is a multi-file/dir archive, then you could equally well come back in >> > an inconsistent state after crashing with some files extracted and >> > some not, without atomic-write-multiple-files-and-directories API. >> >> True, but at least each file will be valid by itself. So no broken >> executables, images or scripts. >> Transactions involving multiple files are outside the scope of this >> discussion. > > But what's the use case where this is useful and/or interesting? ÂIt > certainly doesn't help in the case of dpkg, because you still have to > deal with shell scripts that depend on certain executables being > present, or executables depending on the new version of the shared > library being present. ÂIf we're going to give up huge amounts of file > system performance for some use case, it's nice to know what the > real-world use case would actually be. Â(And again, I believe the dpkg > folks are squared away at this point.) Why would this require a huge performance hit? It's comparable to temp file + rename which doesn't have this performance hit either AFAIK. > If the use case is really one of replacing the data while maintaining > the metadata (i.e., ACL's, extended attributes, etc.), we've already > pointed out that in the case of a file editor, you had better have > durability. ÂKeep in mind that if you don't eventually call fsync(), > you'll never know if the file system is full or the user has hit their > quota, and the data can't be lazily written out later. ÂOr in the case > of a networked file system, what if the network connection disappears > before you have a chance to lazily update the data and do the rename? > So before the editor exits, and the last remaining copy of the new > data (in memory) disappears, you had better call fsync() and check to > make sure the write can and has succeeded. Good point. So fsync is still needed in that case. What about the meta-data though? > So in the case of replacing the data, what's the use case if it's not > for a file editor? ÂAnd note that you've said that you want atomicity > because you want to make sure that after a crash you don't lose data. > What about the case where the system doesn't crash, but the wireless > connection goes away, or the user has exceeded his/her quota and they > were trying to replace 4k worth of data fork with 12k worth of data? > I can certainly think of scenarios where wireless connection drops and > quota overruns are far more likely than system crashes. Â(i.e., when > you're not using proprietary video drivers. Â:-P) > >> Providing transaction semantics for multiple files is a far broader >> proposal and not necessary for implement this proposal. > > But providing magic transaction semantics for a single file in the > rename is not at all clearly useful. ÂYou need to justify all of this > hard effort, and performance loss. Â(Well, or if you're so smart you > can implement your own file system that does all of this work, and we > can benchmark it against a file system that doesn't do all of this > work....) Still waiting on any hint for why that performance loss would happen. >> I'm not sure, but Ted appears to be saying temp file + rename (but no >> fsync) isn't guaranteed to work either. > > It won't work if you get really unlucky and your system takes a power > cut right at the wrong moment during or after the rename(). ÂIt could > be made to work, but at a performance cost. ÂAnd the question is > whether the performance cost is worth it. ÂAt the end of the day it's > all between the tradeoff between performance cost, implementation > cost, and value to the user and the application programmer. ÂWhich is > why you need to articular the use case where this makes sense. > > It's not dpkg, and it's not file editors. ÂWhat is it, specifically? > And why can it tolerate data loss in the case of quota overruns and > wireless connection hits, but not in the case of system crashes? There are two different kinds of losses here. One is losing the entire file, the other is losing the update but still having the old file. >> It just seems quite suboptimal. There's no need for infinite storage >> (or an oracle) to avoid this. > > If you're so smart, why don't you try implementing it? ÂItt's going to > be hard for us to convince you why it's going to be non-trivial and > have huge implementation *and* performance costs, so why don't you > produce the patches that makes this all work? Why is that so hard? Should be a lot easier then me implementing a FS from scratch. Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html