On Sat, Dec 25, 2010 at 6:25 PM, Nick Piggin <npiggin@xxxxxxxxx> wrote: >> No, not arbitrary writes. It's about complete file writes. > > You still haven't defined exactly what you want. Do you not understand what is meant by a complete file write? >> Atomic semantics are not (that) complex. > > That is something to be argued over patches. What is not in question > is that an atomic API is more complex than none :) That's implementation complexity, not concept/semantics complexity. >> Like I said before, it's not about DB-like functionality but about >> complete file writes/updates. For example, I've got a file in an >> editor and I want to save it. > > I don't understand your example, because in that case you surely > want durability. Hmm, true, bad example, although it depends on editor/user. Let's take archive extraction instead. >> Let me copy the original post: >> Writing a temp file, fsync, rename is often proposed. However, the >> durable aspect of fsync isn't always required > > So you want a way to atomically replace the contents of a file with > new contents, in a way which completes asynchronously and lazily, > and your new contents will eventually just appear sometime after > they are guaranteed to be on disk? Almost. Visibility to other process should be normal (I don't know the exact rules), but commit to disk may be deferred. > You would need to create an unlinked inode with dirty data, and then > have callbacks from pagecache writeback checking when the inode > is cleaned, and then call appropriate filesystem routines to sync and > issue barriers etc, and rename the old name to the new inode. That's an implementation detail, but yes, something like that. > You will also need to have a chain of inodes representing ordering of > the updates so the renames can be performed in the right order. And > add some hooks to solve the metadata issue. > > Then what happens when you fsync the original file? What if the > original file is renamed or unlinked? How do you sync the outstanding > queue of updates? Logically those actions would happen after the atomic data update. The fsync would be done on a now unlinked file (if done via fd). The rename would be done on the new file. Same for unlink. > Once you solve all those problems, then people will ask you to now > solve them for multiple files at once because they also have some > great use-case that is surely nothing like databases. I don't want to play the what if game. > Please tell us what for. If you have immediate need to replace the > name, then you need the durability of fsync. If you don't have > immediate need, then you can use another name, surely (until it > comes time you want to switch names, at that point you want > durability so you fsync then rename). Temp file, rename has issues with losing meta-data. > >> and this way has other >> issues, like losing file meta-data. > > Yes that's true, if you're not owner you may not be able to recreate > most of it. Did you need to? Yes > >> What is the recommended way for atomic non-durable (complete) file writes? > > There really isn't one. Like I said, there is not much atomicity > semantics in the API, which works really well because it is simple > to implement and to use (although apparently still far too complex > for some programmers to get right). It's simple to implement but it's not simple to use right. > If we start adding atomicity beyond fundamental requirement of > namespace operations, then where does it end? Why would it make > sense to add atomicity for writes to one file, but not writes to 2 files? > What if you require atomic multiple modifications to directory > structure as well as file updates? And why only writes? What about > atomic reads of several things? What isolation level should all of that > have, and how to solve deadlocks? > > >> I'm also wondering why FSs commit after open/truncate but before >> write/close. AFAIK this isn't necessary and thus suboptimal. > > I don't know, can you expand on this? What fses are you talking > about, and what behaviour. The zero size issues of ext4 (before some patch). Presumably because some apps do open, truncate, write, close on a file. I'm wondering why an FS commits between truncate and write. Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html