Hello, On Thu, March 10, 2011 20:31, Sage Weil wrote: > It is frequently useful to sync a single file system, instead of all > mounted file systems via sync(2): > > - On machines with many mounts, it is not at all uncommon for some of > them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on > those and may never get to the one you do care about (e.g., /). > - Some applications write lots of data to the file system and then > want to make sure it is flushed to disk. Calling fsync(2) on each > file introduces unnecessary ordering constraints that result in a large > amount of sub-optimal writeback/flush/commit behavior by the file > system. > > There are currently two ways (that I know of) to sync a single super_block: > > - BLKFLSBUF ioctl on the block device: That also invalidates the bdev > mapping, which isn't usually desirable, and doesn't work for non-block > file systems. > - 'mount -o remount,rw' will call sync_filesystem as an artifact of the > current implemention. Relying on this little-known side effect for > something like data safety sounds foolish. > > Both of these approaches require root privileges, which some applications > do not have (nor should they need?) given that sync(2) is an unprivileged > operation. > > This patch introduces a new system call syncfs(2) that takes an fd and > syncs only the file system it references. Maybe someday we can > > $ sync /some/path > > and not get > > sync: ignoring all arguments > > The syscall is motivated by comments by Al and Christoph at the last LSF. > syncfs(2) seems like an appropriate name given statfs(2). > > A similar ioctl was also proposed a while back, see > http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2 The patch there seems much more reasonable than introducing a whole new systemcall just for 20 lines of kernel code. New system calls are added too easily nowadays. As an alternative to the ioctl, I propose extending sync_file_range() instead. E.g. add a SYNC_FILE_MOUNT flag and use that, either on any fd on the mount or the root dir fd. That syscall is non-standard and close enough that it can implement this behaviour too. Greetings, Indan --- Something like: diff --git a/fs/sync.c b/fs/sync.c index ba76b96..9fa073c 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -18,7 +18,7 @@ #include "internal.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ - SYNC_FILE_RANGE_WAIT_AFTER) + SYNC_FILE_RANGE_WAIT_AFTER|SYNC_FILE_MOUNT) /* * Do the filesystem syncing work. For simple filesystems @@ -330,6 +330,15 @@ SYSCALL_DEFINE(sync_file_range)(int fd, loff_t offset, loff_t nbytes, } ret = 0; + if (flags & SYNC_FILE_MOUNT) { + struct super_block *sb; + + sb = file->f_dentry->d_sb; + down_read(&sb->s_umount); + ret = sync_filesystem(sb); + up_read(&sb->s_umount); + goto out_put; + } if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { ret = filemap_fdatawait_range(mapping, offset, endbyte); if (ret < 0) diff --git a/include/linux/fs.h b/include/linux/fs.h index e38b50a..53e427e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -373,6 +373,7 @@ struct inodes_stat_t { #define SYNC_FILE_RANGE_WAIT_BEFORE 1 #define SYNC_FILE_RANGE_WRITE 2 #define SYNC_FILE_RANGE_WAIT_AFTER 4 +#define SYNC_FILE_MOUNT 8 #ifdef __KERNEL__ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html