Re: [PATCH] add -o flush for fat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 05, 2006 at 12:12:43PM -0700, Andrew Morton wrote:

[ thanks for the comments everyone ]

> Consequently the semantics which we require of the library functions which
> you've added should be:
> 
> /*
>  * On return from this function all dirty data has been cleaned - it is either
>  * on disk or is in flight in the I/O queues.
>  */
> 
> These semantics are exactly provided by
> do_sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE).
> I'd suggest that do_sync_file_range() could be used to sync the file's
> data in this application, if only for reasons of clarity and precision.

do_sync_file_range is for files.  I don't have files for the directory
inodes.

> 
> That leaves the file metadata, the inode itself and the superblock, for
> which we'd ideally likely same semantics, but stronger sync semantics would
> of course meet our requirement.
> 
> metadata: we don't have an
> SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE implementation of
> fsync_buffers_list() so unless we add such, we need to use
> fsync_buffers_list()'s
> (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
> semantics.  I'd suggest a call to sync_mapping_buffers() directly.

Sadly fat doesn't use mark_buffer_dirty_inode.  We can change it to do
so of course.

> 
> The inode: write_inode_now(wait=0) will I think be appropriate here.

The fun part of write_inode_now is the part where __sync_single_inode
does this:

	int wait = wbc->sync_mode == WB_SYNC_ALL;
	...
	if (wait) {
		int err = filemap_fdatawait(mapping);
	...

I forget this every time and had to sysrq-t my way back to this
discovery again today.  This may or may not be the desired result of
write_inode_now, but it is not quite right for -o flush.

> 
> superblock: it's possible that the proposed three-step scheme will leave
> the filesystem's superblock unwritten.  Need to think about that.
> 
> So I don't think any additional helper functions are needed.  The above
> three are all exported to filesystems.
> 
> 
> To test all this I'd suggest that you verify that `-o flush' takes
> /rpoc/meminfo:Dirty reliably down to zero on a quiet system for various
> operations.
> 
My test:

cp -a /usr/share/doc/pages fat_fs/orig
cd fat_fs ; cp -a orig copy
sleep 2
reset

Then diff the directories after things come back.

> 
> If we're not adding new VFS functions then fatfs can generally hack around
> with these functions until it does what we want it to do.  If we _do_ add
> new VFS functions then their exact data integrity behaviour should be
> carefully designed and documented, please.  Otherwise we'll lose data or
> make people's machines slower than they need to be - we have a track record
> of screwing this stuff up.
> 
> There was quite some duplication of code within the fat patch you sent - it
> looked like a little helper function was needed.

New patch below, it is slightly different.  I'm still hoping we can
avoid the timer to cover in flight ios ignored by filemap_flush, but it
may come to that later on.

> 
> I expect the `-o flush' behaviour will be useful on other filesystem types.
> Did you consider moving it into the generic mount options (MS_FLUSH?)
> rather than making it fat-specific?

I did, but this is one of those fuzzy patches.  It's not actually a good
idea, but it's the best compromise I can find between what the users
wanted and what I was willing to ship.

If other filesystems what this bad idea, we can certainly add it, but I
wanted to start small...

> 
> Sorry to complicate a little patch, but I'd prefer to get this right
> going-in.
> 
> iirc, Hirofumi-san had a patch which did the same thing as this one 6-odd
> months ago, but it seemed to die off.
> 

The ones I found 6-odd months ago didn't cover metadata correctly.
His stuff may have improved, sorry if I missed it.

From: Chris Mason <mason@xxxxxxxx>
Subject: add -o flush for fat

Fat is commonly used on removable media. Mounting with -o flush tells the
FS to write things to disk as quickly as possible.  It is like -o sync, but
much faster (and not as safe).

Signed-off-by: Chris Mason <mason@xxxxxxxx>

---
diff -r 8a50d9b6ce6f fs/fat/file.c
--- a/fs/fat/file.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/file.c	Mon Aug 07 15:45:42 2006 -0400
@@ -13,6 +13,7 @@
 #include <linux/smp_lock.h>
 #include <linux/buffer_head.h>
 #include <linux/writeback.h>
+#include <linux/blkdev.h>
 
 int fat_generic_ioctl(struct inode *inode, struct file *filp,
 		      unsigned int cmd, unsigned long arg)
@@ -112,6 +113,16 @@ int fat_generic_ioctl(struct inode *inod
 	}
 }
 
+static int fat_file_release(struct inode *inode, struct file *filp)
+{
+	if ((filp->f_mode & FMODE_WRITE) &&
+	     MSDOS_SB(inode->i_sb)->options.flush) {
+		fat_flush_inodes(inode->i_sb, inode, NULL);
+		blk_congestion_wait(WRITE, HZ/10);
+	}
+	return 0;
+}
+
 const struct file_operations fat_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -121,6 +132,7 @@ const struct file_operations fat_file_op
 	.aio_read	= generic_file_aio_read,
 	.aio_write	= generic_file_aio_write,
 	.mmap		= generic_file_mmap,
+	.release	= fat_file_release,
 	.ioctl		= fat_generic_ioctl,
 	.fsync		= file_fsync,
 	.sendfile	= generic_file_sendfile,
@@ -289,6 +301,7 @@ void fat_truncate(struct inode *inode)
 	lock_kernel();
 	fat_free(inode, nr_clusters);
 	unlock_kernel();
+	fat_flush_inodes(inode->i_sb, inode, NULL);
 }
 
 struct inode_operations fat_file_inode_operations = {
diff -r 8a50d9b6ce6f fs/fat/inode.c
--- a/fs/fat/inode.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/inode.c	Mon Aug 07 16:19:59 2006 -0400
@@ -24,6 +24,7 @@
 #include <linux/vfs.h>
 #include <linux/parser.h>
 #include <linux/uio.h>
+#include <linux/writeback.h>
 #include <asm/unaligned.h>
 
 #ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -861,7 +862,7 @@ enum {
 	Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
 	Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
 	Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
-	Opt_obsolate, Opt_err,
+	Opt_obsolate, Opt_flush, Opt_err,
 };
 
 static match_table_t fat_tokens = {
@@ -893,7 +894,8 @@ static match_table_t fat_tokens = {
 	{Opt_obsolate, "cvf_format=%20s"},
 	{Opt_obsolate, "cvf_options=%100s"},
 	{Opt_obsolate, "posix"},
-	{Opt_err, NULL}
+	{Opt_flush, "flush"},
+	{Opt_err, NULL},
 };
 static match_table_t msdos_tokens = {
 	{Opt_nodots, "nodots"},
@@ -1034,6 +1036,9 @@ static int parse_options(char *options, 
 				return 0;
 			opts->codepage = option;
 			break;
+		case Opt_flush:
+			opts->flush = 1;
+			break;
 
 		/* msdos specific */
 		case Opt_dots:
@@ -1435,6 +1440,56 @@ out_fail:
 
 EXPORT_SYMBOL_GPL(fat_fill_super);
 
+/*
+ * helper function for fat_flush_inodes.  This writes both the inode
+ * and the file data blocks, waiting for in flight data blocks before
+ * the start of the call.  It does not wait for any io started
+ * during the call
+ */
+static int writeback_inode(struct inode *inode)
+{
+
+	int ret;
+	struct address_space *mapping = inode->i_mapping;
+	struct writeback_control wbc = {
+	       .sync_mode = WB_SYNC_NONE,
+	      .nr_to_write = 0,
+	};
+	/* if we used WB_SYNC_ALL, sync_inode waits for the io for the
+	* inode to finish.  So WB_SYNC_NONE is sent down to sync_inode
+	* and filemap_fdatawrite is used for the data blocks
+	*/
+	ret = sync_inode(inode, &wbc);
+	if (!ret)
+	       ret = filemap_fdatawrite(mapping);
+	return ret;
+}
+
+/*
+ * write data and metadata corresponding to i1 and i2.  The io is
+ * started but we do not wait for any of it to finish.
+ *
+ * filemap_flush is used for the block device, so if there is a dirty
+ * page for a block already in flight, we will not wait and start the
+ * io over again
+ */
+int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
+{
+	int ret = 0;
+	if (!MSDOS_SB(sb)->options.flush)
+		return 0;
+	if (i1)
+		ret = writeback_inode(i1);
+	if (!ret && i2)
+		ret = writeback_inode(i2);
+	if (!ret && sb) {
+		struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+		ret = filemap_flush(mapping);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(fat_flush_inodes);
+
 static int __init init_fat_fs(void)
 {
 	int err;
diff -r 8a50d9b6ce6f fs/msdos/namei.c
--- a/fs/msdos/namei.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/msdos/namei.c	Mon Aug 07 14:35:17 2006 -0400
@@ -280,7 +280,7 @@ static int msdos_create(struct inode *di
 			struct nameidata *nd)
 {
 	struct super_block *sb = dir->i_sb;
-	struct inode *inode;
+	struct inode *inode = NULL;
 	struct fat_slot_info sinfo;
 	struct timespec ts;
 	unsigned char msdos_name[MSDOS_NAME];
@@ -316,6 +316,8 @@ static int msdos_create(struct inode *di
 	d_instantiate(dentry, inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(sb, dir, inode);
 	return err;
 }
 
@@ -348,6 +350,8 @@ static int msdos_rmdir(struct inode *dir
 	fat_detach(inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(inode->i_sb, dir, inode);
 
 	return err;
 }
@@ -401,6 +405,7 @@ static int msdos_mkdir(struct inode *dir
 	d_instantiate(dentry, inode);
 
 	unlock_kernel();
+	fat_flush_inodes(sb, dir, inode);
 	return 0;
 
 out_free:
@@ -430,6 +435,8 @@ static int msdos_unlink(struct inode *di
 	fat_detach(inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(inode->i_sb, dir, inode);
 
 	return err;
 }
@@ -635,6 +642,8 @@ static int msdos_rename(struct inode *ol
 			      new_dir, new_msdos_name, new_dentry, is_hid);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(old_dir->i_sb, old_dir, new_dir);
 	return err;
 }
 
diff -r 8a50d9b6ce6f include/linux/msdos_fs.h
--- a/include/linux/msdos_fs.h	Fri Aug 04 14:02:26 2006 -0400
+++ b/include/linux/msdos_fs.h	Mon Aug 07 14:35:17 2006 -0400
@@ -204,6 +204,7 @@ struct fat_mount_options {
 		 unicode_xlate:1, /* create escape sequences for unhandled Unicode */
 		 numtail:1,       /* Does first alias have a numeric '~1' type tail? */
 		 atari:1,         /* Use Atari GEMDOS variation of MS-DOS fs */
+		 flush:1,	  /* write things quickly */
 		 nocase:1;	  /* Does this need case conversion? 0=need case conversion*/
 };
 
@@ -412,6 +413,8 @@ extern int fat_fill_super(struct super_b
 extern int fat_fill_super(struct super_block *sb, void *data, int silent,
 			struct inode_operations *fs_dir_inode_ops, int isvfat);
 
+extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
+		            struct inode *i2);
 /* fat/misc.c */
 extern void fat_fs_panic(struct super_block *s, const char *fmt, ...);
 extern void fat_clusters_flush(struct super_block *sb);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux