Wild idea: O_WHOLEFILE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just had to resume a compile interrupted by a crash, and I
noticed that make(1)'s usual deleting of incomplete target
files doesn't work if the make itself is killed.

And then I started thinking that there are a lot of situations where
you want a file write operation to be atomic: either the whole file is
written or nothing is.  Some applications know how to resume downloadsm
but in many cases where a file is created, you'd rather have no file
than a partial file with the right name and a recent timestamp.

The usual workaround is to create the file with a temporary name and
rename it once it's complete.  Then a crash just leaves you with a
leftover temporary file.  There's an awful lot of software that does
that dance.  And an awful lot that doesn't go to the trouble.


I got to thinking about an open(2) flag that could help, and what it
should do and how difficult it would be to implement.  The obvious name
would be O_ATOMIC, but that's perhaps subject to too much interpretation.
Other names are welcome.

Basically, when opened with O_WHOLEFILE, the file name would be created,
but if not closed with an explicit close(2) call (e.g. computer crash
or program exit), the file would be automatically deleted.

Implementation-wise, you could allocate the inode, but put it on the
deleted-but-open list, and create the name in the dcache but not on the
underlying file system.  Then, at close(2) time, create the hard link
on the underlying file system.

The file system has to support taking files off the pending-delete
list, but I don't think that's *too* hard.


Notes and unresolved questions:
- Ideally, permission checking would be done at open(2) time,
  and the fs-level link wouldn't require an additional check, but
  I can see how some network file systems might check again.
- This makes it more likly that close(2) would fail.  Applications
  would just have to check that in this case.
- What should dup(2) do with such a file descriptor?  Does the file
  get committed when the original FD is closed, or when the file table
  entry is deallocated, i.e. all fds poitning to it closed?
- The former has the advantage that it provides a way, using dup(2), to
  commit the file but leave it open.
- stat() on such a file would report nlink == 0.  Is this okay,
  or should it be faked to 1?
- Should attempts to open(2) the file fail?  Is there something like
  ETXTBSY that could be returned?  Or should it just be treated like
  a mandatory lock and block on read?
- Another application could, however, delete the uncommitted file name.
- I would expect rename(2) to work, leaving the new name in the
  "not quite committed" state.
- link(2), on the other hand, should create the new name immediately.
- should closing the fd via dup2() be equivalent to an explicit close(2)?
  It's still explicit program action, so I guess so.
  (Are there any other ways to close and fd?)
- If so, would dup2(fd, fd) commit the file?
- how about shutdown(fd, SHUT_WR)?
- What error to return if the cile system doesn't support the
  option?  Silently ignore it?  Or should there be a way to find out
  if it is supported?  pathconf()/fpathconf()?  Or fcntl(F_GETFL)?

The nice thing about this is that, on systems that don't support it,
#define O_WHOLEFILE 0
would provide the usual semantics.

A text editor that wanted to use this to atomically replace an existing
file (without risking losing both copies if interrupted) could do the
following dance, only slightly modified from the rename(2) one.
(The classic rename(2) would also work, but it would have a small
race condition.)

// Generate a temporary file name in the same directory as "name"
static char const template[] = "editortemp.XXXXXX"
char *p = strrchr(name, '/');
size_t len = p ? p+1-name : 0;
char *tmpname = malloc(len+sizeof template);
memcpy(tmpname, name, len);
memcpy(tmpname+len, template, sizeof template);

// Now here's the dance proper.
int fd = mkostemp(tmpname, O_WHOLEFILE);
write(fd, buf, len);	// Repeat as necessary
fsync(fd);
if (fcntl(fd, F_GETFL) & O_WHOLEFILE) {
	link(tmpname, name);	// Atomically replace "name"
	unlink(tmpname);	// Get rid of temp file before closing it
} else {
	// FS doesn't support O_WHOLEFILE, so it might not support links
	rename(tmpname, name);
}
close(fd);
free(tmpname);


It's a big user-level feature, so I'm curious if this seeme useful enough
to warrant the complexity.  It would be Linux-specific, but I don't see
another way to completely close the race condition, and it's very easy
to write code that degrades gracefully if it's not available.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux