Dave Chinner wrote: > If the inode is dirty and fsync does nothing, then that filesystem > is *broken*. If writing to the inode doesn't dirty it, then the > filesystem is broken. Fix the broken filesystem. *Wrong* Very, very wrong. You do not write totally unchanged inode bytes just for the sake of causing a NOP transaction to make the disk write the fsync as a side-effect of a broken paradigm. That's _three_ pointless I/Os (one redundant barrier and two writes), and probably 50x slowdown in write performance due to seeking. Now who's filesystem is broken? > > For efficient fdatasync() you _never_ want a transaction if possible, > > because it forces the disk head to seek between alternating regions of > > the disk, two seeks per fsync(). > > If there is dirty metadata that is need to be logged or flushed, > then fdatasync() needs to do something. If it doesn't do it > correctly, then that *filesystem is broken*. Fix the broken > filesystem. A series of a writes over existing data and fdatasync() should *never* write to the transaction log, unless you mounted something like ext3 data=journal, which isn't usual. There is no dirty metadata to write. It is data only. fdatasync() *means* "do NOT write metadata that is not needed for data retrieval", that's it's whole point. A filesystem which keeps seeking to its inode area _and_ its journal area _and_ the data area on every fdatasync() is a poor design indeed. > > So you can't rely on journalling transactions to flush. > > The VFS doesn't even know about transactions.... Whoever brought them up said they can be relied on to flush writes during fsync/fdatasync. Just saying they can't, is all... > > > Finally, I prefer maintainers of the filesystems themselves to > > > decide whether their filesystem needs flushing and thus > > > knowingly impose this performance penalty on them... > > > > I say it should flush be default unless a filesystem hooks an > > alternative strategy. Certainly, it's silly to have the same code > > duplicated in nearly every filesystem > > So write a *generic helper* for those filesystems that do the same > thing and hook it to their ->fsync method. Don't hard code it in the > VFS so other filesystem dev's have to come along afterwards and turn > it off. Are there any at the moment which would turn it off? If so that's a fine idea. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html