On Sat, Aug 05, 2006 at 12:12:43PM -0700, Andrew Morton wrote: [ thanks for the comments everyone ] > Consequently the semantics which we require of the library functions which > you've added should be: > > /* > * On return from this function all dirty data has been cleaned - it is either > * on disk or is in flight in the I/O queues. > */ > > These semantics are exactly provided by > do_sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). > I'd suggest that do_sync_file_range() could be used to sync the file's > data in this application, if only for reasons of clarity and precision. do_sync_file_range is for files. I don't have files for the directory inodes. > > That leaves the file metadata, the inode itself and the superblock, for > which we'd ideally likely same semantics, but stronger sync semantics would > of course meet our requirement. > > metadata: we don't have an > SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE implementation of > fsync_buffers_list() so unless we add such, we need to use > fsync_buffers_list()'s > (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) > semantics. I'd suggest a call to sync_mapping_buffers() directly. Sadly fat doesn't use mark_buffer_dirty_inode. We can change it to do so of course. > > The inode: write_inode_now(wait=0) will I think be appropriate here. The fun part of write_inode_now is the part where __sync_single_inode does this: int wait = wbc->sync_mode == WB_SYNC_ALL; ... if (wait) { int err = filemap_fdatawait(mapping); ... I forget this every time and had to sysrq-t my way back to this discovery again today. This may or may not be the desired result of write_inode_now, but it is not quite right for -o flush. > > superblock: it's possible that the proposed three-step scheme will leave > the filesystem's superblock unwritten. Need to think about that. > > So I don't think any additional helper functions are needed. The above > three are all exported to filesystems. > > > To test all this I'd suggest that you verify that `-o flush' takes > /rpoc/meminfo:Dirty reliably down to zero on a quiet system for various > operations. > My test: cp -a /usr/share/doc/pages fat_fs/orig cd fat_fs ; cp -a orig copy sleep 2 reset Then diff the directories after things come back. > > If we're not adding new VFS functions then fatfs can generally hack around > with these functions until it does what we want it to do. If we _do_ add > new VFS functions then their exact data integrity behaviour should be > carefully designed and documented, please. Otherwise we'll lose data or > make people's machines slower than they need to be - we have a track record > of screwing this stuff up. > > There was quite some duplication of code within the fat patch you sent - it > looked like a little helper function was needed. New patch below, it is slightly different. I'm still hoping we can avoid the timer to cover in flight ios ignored by filemap_flush, but it may come to that later on. > > I expect the `-o flush' behaviour will be useful on other filesystem types. > Did you consider moving it into the generic mount options (MS_FLUSH?) > rather than making it fat-specific? I did, but this is one of those fuzzy patches. It's not actually a good idea, but it's the best compromise I can find between what the users wanted and what I was willing to ship. If other filesystems what this bad idea, we can certainly add it, but I wanted to start small... > > Sorry to complicate a little patch, but I'd prefer to get this right > going-in. > > iirc, Hirofumi-san had a patch which did the same thing as this one 6-odd > months ago, but it seemed to die off. > The ones I found 6-odd months ago didn't cover metadata correctly. His stuff may have improved, sorry if I missed it. From: Chris Mason <mason@xxxxxxxx> Subject: add -o flush for fat Fat is commonly used on removable media. Mounting with -o flush tells the FS to write things to disk as quickly as possible. It is like -o sync, but much faster (and not as safe). Signed-off-by: Chris Mason <mason@xxxxxxxx> --- diff -r 8a50d9b6ce6f fs/fat/file.c --- a/fs/fat/file.c Fri Aug 04 14:02:26 2006 -0400 +++ b/fs/fat/file.c Mon Aug 07 15:45:42 2006 -0400 @@ -13,6 +13,7 @@ #include <linux/smp_lock.h> #include <linux/buffer_head.h> #include <linux/writeback.h> +#include <linux/blkdev.h> int fat_generic_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg) @@ -112,6 +113,16 @@ int fat_generic_ioctl(struct inode *inod } } +static int fat_file_release(struct inode *inode, struct file *filp) +{ + if ((filp->f_mode & FMODE_WRITE) && + MSDOS_SB(inode->i_sb)->options.flush) { + fat_flush_inodes(inode->i_sb, inode, NULL); + blk_congestion_wait(WRITE, HZ/10); + } + return 0; +} + const struct file_operations fat_file_operations = { .llseek = generic_file_llseek, .read = do_sync_read, @@ -121,6 +132,7 @@ const struct file_operations fat_file_op .aio_read = generic_file_aio_read, .aio_write = generic_file_aio_write, .mmap = generic_file_mmap, + .release = fat_file_release, .ioctl = fat_generic_ioctl, .fsync = file_fsync, .sendfile = generic_file_sendfile, @@ -289,6 +301,7 @@ void fat_truncate(struct inode *inode) lock_kernel(); fat_free(inode, nr_clusters); unlock_kernel(); + fat_flush_inodes(inode->i_sb, inode, NULL); } struct inode_operations fat_file_inode_operations = { diff -r 8a50d9b6ce6f fs/fat/inode.c --- a/fs/fat/inode.c Fri Aug 04 14:02:26 2006 -0400 +++ b/fs/fat/inode.c Mon Aug 07 16:19:59 2006 -0400 @@ -24,6 +24,7 @@ #include <linux/vfs.h> #include <linux/parser.h> #include <linux/uio.h> +#include <linux/writeback.h> #include <asm/unaligned.h> #ifndef CONFIG_FAT_DEFAULT_IOCHARSET @@ -861,7 +862,7 @@ enum { Opt_charset, Opt_shortname_lower, Opt_shortname_win95, Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes, Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes, - Opt_obsolate, Opt_err, + Opt_obsolate, Opt_flush, Opt_err, }; static match_table_t fat_tokens = { @@ -893,7 +894,8 @@ static match_table_t fat_tokens = { {Opt_obsolate, "cvf_format=%20s"}, {Opt_obsolate, "cvf_options=%100s"}, {Opt_obsolate, "posix"}, - {Opt_err, NULL} + {Opt_flush, "flush"}, + {Opt_err, NULL}, }; static match_table_t msdos_tokens = { {Opt_nodots, "nodots"}, @@ -1034,6 +1036,9 @@ static int parse_options(char *options, return 0; opts->codepage = option; break; + case Opt_flush: + opts->flush = 1; + break; /* msdos specific */ case Opt_dots: @@ -1435,6 +1440,56 @@ out_fail: EXPORT_SYMBOL_GPL(fat_fill_super); +/* + * helper function for fat_flush_inodes. This writes both the inode + * and the file data blocks, waiting for in flight data blocks before + * the start of the call. It does not wait for any io started + * during the call + */ +static int writeback_inode(struct inode *inode) +{ + + int ret; + struct address_space *mapping = inode->i_mapping; + struct writeback_control wbc = { + .sync_mode = WB_SYNC_NONE, + .nr_to_write = 0, + }; + /* if we used WB_SYNC_ALL, sync_inode waits for the io for the + * inode to finish. So WB_SYNC_NONE is sent down to sync_inode + * and filemap_fdatawrite is used for the data blocks + */ + ret = sync_inode(inode, &wbc); + if (!ret) + ret = filemap_fdatawrite(mapping); + return ret; +} + +/* + * write data and metadata corresponding to i1 and i2. The io is + * started but we do not wait for any of it to finish. + * + * filemap_flush is used for the block device, so if there is a dirty + * page for a block already in flight, we will not wait and start the + * io over again + */ +int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2) +{ + int ret = 0; + if (!MSDOS_SB(sb)->options.flush) + return 0; + if (i1) + ret = writeback_inode(i1); + if (!ret && i2) + ret = writeback_inode(i2); + if (!ret && sb) { + struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; + ret = filemap_flush(mapping); + } + return ret; +} +EXPORT_SYMBOL_GPL(fat_flush_inodes); + static int __init init_fat_fs(void) { int err; diff -r 8a50d9b6ce6f fs/msdos/namei.c --- a/fs/msdos/namei.c Fri Aug 04 14:02:26 2006 -0400 +++ b/fs/msdos/namei.c Mon Aug 07 14:35:17 2006 -0400 @@ -280,7 +280,7 @@ static int msdos_create(struct inode *di struct nameidata *nd) { struct super_block *sb = dir->i_sb; - struct inode *inode; + struct inode *inode = NULL; struct fat_slot_info sinfo; struct timespec ts; unsigned char msdos_name[MSDOS_NAME]; @@ -316,6 +316,8 @@ static int msdos_create(struct inode *di d_instantiate(dentry, inode); out: unlock_kernel(); + if (!err) + err = fat_flush_inodes(sb, dir, inode); return err; } @@ -348,6 +350,8 @@ static int msdos_rmdir(struct inode *dir fat_detach(inode); out: unlock_kernel(); + if (!err) + err = fat_flush_inodes(inode->i_sb, dir, inode); return err; } @@ -401,6 +405,7 @@ static int msdos_mkdir(struct inode *dir d_instantiate(dentry, inode); unlock_kernel(); + fat_flush_inodes(sb, dir, inode); return 0; out_free: @@ -430,6 +435,8 @@ static int msdos_unlink(struct inode *di fat_detach(inode); out: unlock_kernel(); + if (!err) + err = fat_flush_inodes(inode->i_sb, dir, inode); return err; } @@ -635,6 +642,8 @@ static int msdos_rename(struct inode *ol new_dir, new_msdos_name, new_dentry, is_hid); out: unlock_kernel(); + if (!err) + err = fat_flush_inodes(old_dir->i_sb, old_dir, new_dir); return err; } diff -r 8a50d9b6ce6f include/linux/msdos_fs.h --- a/include/linux/msdos_fs.h Fri Aug 04 14:02:26 2006 -0400 +++ b/include/linux/msdos_fs.h Mon Aug 07 14:35:17 2006 -0400 @@ -204,6 +204,7 @@ struct fat_mount_options { unicode_xlate:1, /* create escape sequences for unhandled Unicode */ numtail:1, /* Does first alias have a numeric '~1' type tail? */ atari:1, /* Use Atari GEMDOS variation of MS-DOS fs */ + flush:1, /* write things quickly */ nocase:1; /* Does this need case conversion? 0=need case conversion*/ }; @@ -412,6 +413,8 @@ extern int fat_fill_super(struct super_b extern int fat_fill_super(struct super_block *sb, void *data, int silent, struct inode_operations *fs_dir_inode_ops, int isvfat); +extern int fat_flush_inodes(struct super_block *sb, struct inode *i1, + struct inode *i2); /* fat/misc.c */ extern void fat_fs_panic(struct super_block *s, const char *fmt, ...); extern void fat_clusters_flush(struct super_block *sb); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html