I just had to resume a compile interrupted by a crash, and I noticed that make(1)'s usual deleting of incomplete target files doesn't work if the make itself is killed. And then I started thinking that there are a lot of situations where you want a file write operation to be atomic: either the whole file is written or nothing is. Some applications know how to resume downloadsm but in many cases where a file is created, you'd rather have no file than a partial file with the right name and a recent timestamp. The usual workaround is to create the file with a temporary name and rename it once it's complete. Then a crash just leaves you with a leftover temporary file. There's an awful lot of software that does that dance. And an awful lot that doesn't go to the trouble. I got to thinking about an open(2) flag that could help, and what it should do and how difficult it would be to implement. The obvious name would be O_ATOMIC, but that's perhaps subject to too much interpretation. Other names are welcome. Basically, when opened with O_WHOLEFILE, the file name would be created, but if not closed with an explicit close(2) call (e.g. computer crash or program exit), the file would be automatically deleted. Implementation-wise, you could allocate the inode, but put it on the deleted-but-open list, and create the name in the dcache but not on the underlying file system. Then, at close(2) time, create the hard link on the underlying file system. The file system has to support taking files off the pending-delete list, but I don't think that's *too* hard. Notes and unresolved questions: - Ideally, permission checking would be done at open(2) time, and the fs-level link wouldn't require an additional check, but I can see how some network file systems might check again. - This makes it more likly that close(2) would fail. Applications would just have to check that in this case. - What should dup(2) do with such a file descriptor? Does the file get committed when the original FD is closed, or when the file table entry is deallocated, i.e. all fds poitning to it closed? - The former has the advantage that it provides a way, using dup(2), to commit the file but leave it open. - stat() on such a file would report nlink == 0. Is this okay, or should it be faked to 1? - Should attempts to open(2) the file fail? Is there something like ETXTBSY that could be returned? Or should it just be treated like a mandatory lock and block on read? - Another application could, however, delete the uncommitted file name. - I would expect rename(2) to work, leaving the new name in the "not quite committed" state. - link(2), on the other hand, should create the new name immediately. - should closing the fd via dup2() be equivalent to an explicit close(2)? It's still explicit program action, so I guess so. (Are there any other ways to close and fd?) - If so, would dup2(fd, fd) commit the file? - how about shutdown(fd, SHUT_WR)? - What error to return if the cile system doesn't support the option? Silently ignore it? Or should there be a way to find out if it is supported? pathconf()/fpathconf()? Or fcntl(F_GETFL)? The nice thing about this is that, on systems that don't support it, #define O_WHOLEFILE 0 would provide the usual semantics. A text editor that wanted to use this to atomically replace an existing file (without risking losing both copies if interrupted) could do the following dance, only slightly modified from the rename(2) one. (The classic rename(2) would also work, but it would have a small race condition.) // Generate a temporary file name in the same directory as "name" static char const template[] = "editortemp.XXXXXX" char *p = strrchr(name, '/'); size_t len = p ? p+1-name : 0; char *tmpname = malloc(len+sizeof template); memcpy(tmpname, name, len); memcpy(tmpname+len, template, sizeof template); // Now here's the dance proper. int fd = mkostemp(tmpname, O_WHOLEFILE); write(fd, buf, len); // Repeat as necessary fsync(fd); if (fcntl(fd, F_GETFL) & O_WHOLEFILE) { link(tmpname, name); // Atomically replace "name" unlink(tmpname); // Get rid of temp file before closing it } else { // FS doesn't support O_WHOLEFILE, so it might not support links rename(tmpname, name); } close(fd); free(tmpname); It's a big user-level feature, so I'm curious if this seeme useful enough to warrant the complexity. It would be Linux-specific, but I don't see another way to completely close the race condition, and it's very easy to write code that degrades gracefully if it's not available. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html