The ext4 automatic-fsync-on-rename discussion has shown that many applications simply Do It Wrong when it comes to rewriting configuration files. Some of the common failures are: - program overwrites the old config file - program writes a new file, but forgets to fsync before rename - program writes the new file in /tmp, so the rename fails on some systems - program writes a new file and fsyncs, but forgets to give the new file the same file ownership, permission and/or extended attributes as the old file Magically taking care of filesystem semantics for every use may not be possible (no O_PONIES for you!), but I believe we can help the applications that just want to completely rewrite a file and atomically replace it. The semantics for O_REWRITE would be: 1) When opening a file O_REWRITE, the file handle points at a freshly allocated, empty file. The original file is still available to programs that open the file without O_REWRITE. 2) O_REWRITE can only be used in conjunction with O_WRONLY, because the file descriptor is not associated with the original file (which has data), but with an empty inode. 3) The code that implements O_REWRITE (kernel? glibc?) makes sure that: - the new file is on the same filesystem as the original file - the new file is not linked (so it is automatically freed after a process or system crash) - the new file's ownership, permissions and extended attributes match that of the original file 4) The application that opens a file O_REWRITE is required to rewrite the entire file. 5) On close(), the code that implements O_REWRITE makes sure that the file is atomically renamed, so that if a system crash happens, the user will see either the old or the new file contents, but never an empty file. 6) After close(), processes that open the file will get the new content. Processes that previously opened the file will hold on to the old inode and get old contents. Here are my questions: - Are these semantics useful for programs that want to replace config (or other) files with new content? - Are these semantics sane? - What would be the best place to implement these semantics? Relying on application developers to get it right seems to not have worked out well, so I'm thinking kernel or glibc. Glibc has the advantage of it not being in the kernel, but implementing it in-kernel might give us the opportunity for performance enhancements, like reducing step (5) to merely enforcing ordering between filesystem operations, instead of requiring an fsync. -- All rights reversed. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html