On Fri, 24 Dec 2010 12:17:46 +0100 Olaf van der Spek <olafvdspek@xxxxxxxxx> wrote: > On Thu, Dec 23, 2010 at 10:51 PM, Neil Brown <neilb@xxxxxxx> wrote: > > You are asking for something that doesn't exist, which is why no-one can tell > > you want the answer is. > > It seems like a very common and basic operation. If it doesn't exist > IMO it should be created. > > > The only mechanism for synchronising different filesystem operations is > > fsync. You should use that. > > > > If it is too slow, use data journalling, and place your journal on a > > small low-latency device (NVRAM??) > > This isn't about some DB-like app, it's about normal file writes, like > archive extractions, compiling, editors, etc. > Yes, it might be nice to have a very low cost way to make those safer against corruption during a crash. It would have to be *very* low cost as in most cases the cost of cleaning up after the crash instead (e.g. 'make clean') is quite low. But people do sometime edit /etc/init.d files with an ordinary editor and it would be rather embarrassing if a crash just at the wrong time left some critical file incomplete, and maybe it would be easier to teach editors to fsync before rename for files in /etc ..... So what would this mechanism really look like? I think the proposal is to delay committing the rename until the writeout of the file is complete, without accelerating the writeout. That would probably require delaying all updates to the directory until the writeout was complete, as trying to reason about which changes were dependent and which were independent is unlikely to be easy. So as soon as you rename a file, you create a dependency between the file and the directory such that no update for the directory may be written while any page in the file is dirty. Conversely, any fsync of the directory would fsync the file as well. Any write to the file should probably break the dependency as you can no longer be sure what exactly the rename was supposed to protect. I suspect that much of the infrastructure for this could be implemented in the VFS/VM. Certainly the dependency linkage between inodes, created on rename, destroyed on write or fsync or when writeout on the inode completes, and the fsync dependency could be common code. Preventing writeout of directories with dependent files would need some fs interaction. You could probably prototype in ext2 quite easily to do some testing and collection some numbers on overhead. I think this would be an interesting project for someone to do and I would be happy to review any patches. Whether it ever got further than an interesting project would depend very much on how intrusive it was to other filesystems, how much over head it caused, and what actual benefits resulted. If anyone wanted to pursue this idea, they would certainly need to address each of those in their final proposal. I think there could be room for improved transactional semantics in Linux filesystems. This might be what they should look like ... don't know yet. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html