On Wed 27-07-11 05:52:48, Christoph Hellwig wrote: > > +static void __sync_filesystem(struct super_block *sb, int emergency) > > { > > + /* In case of emergency sync we don't want to wait for locks and IO */ > > + if (unlikely(emergency)) > > writeback_inodes_sb(sb); > > + else > > + sync_inodes_sb(sb); > > > > if (sb->s_op->sync_fs) > > + sb->s_op->sync_fs(sb, 0); > > } > > This function doesn't make sense to me in the current form. Why would > be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs? I tried to explain this in the changelog but it seems I failed. wakeup_flusher_threads(0) starts completely asynchronous writeback of all bdis. We could do an iteration of sb_sync_fs with wait==0 over all superblocks just after wakeup_flusher_threads(0) returns but at this moment, almost nothing is written yet because flusher threads have just started doing their work. So what useful work can ->sync_fs(0) do in such case? E.g. ocfs2 could start processing orphan list but for example starting quota writeback as done by XFS would be really fruitless as quota is going to change heavily as a result of writeback I expect. Similarly ext3 or ext4 cannot really do useful work while most inodes are not yet written. It would be nice to call ->sync_fs with wait == 0 when flusher thread is done with the bdi but there's no easy way to do that. We could use the completion for this but we'd have to track these for every bdi which isn't very nice. That's why I chose to synchronize with flusher threads using the synchronous inode writeback - when that is finished we know flusher threads are done and so calling ->sync_fs() can be useful. So do you think calling ->sync_fs() with wait == 0 that late is a problem? > > -static void sync_filesystems(int wait) > > +static void sync_filesystems(int emergency) > > { > > - iterate_supers(sync_one_sb, &wait); > > + iterate_supers(sync_one_sb, &emergency); > > } > > > I'd just drop the sync_filesystems wrapper, and also use individual > callbacks for the two cases. OK, will do. > > SYSCALL_DEFINE0(sync) > > { > > + /* Start flushing on all devices */ > > wakeup_flusher_threads(0); > > + /* > > + * Above call queued work doing complete writeout on each filesystem. > > + * Now we queue work which guarantees data integrity of all inodes > > + * - not much should be left for it to write. The WB_SYNC_ALL inode > > + * writeback also guarantees that sync_fs() is called after inodes > > + * are written out and thus it can do meaningful work. > > + */ > > sync_filesystems(0); > > This really should be an iteration over sb_sync_fs with wait == 0 > > > sync_all_bdevs(0); > > + /* Call blocking ->sync_fs() for each filesystem */ > > + iterate_supers(sb_sync_fs, NULL); Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html