On Sun, Feb 17, 2013 at 05:25:43PM -0800, Eric W. Biederman wrote: > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > > On Wed, Feb 13, 2013 at 10:13:16AM -0800, Eric W. Biederman wrote: > > > >> The crazy thing is that is that xfs appears to > >> directly write their incore inode structure into their journal. > > > > Off topic, but it's actually a very sane thing to do. It's called > > logical object logging, as opposed to physical logging like ext3/4 > > and ocfs2 use. XFS uses a combination of logical logging > > (superblock, dquots, inodes) and physical logging (via buffers). > > Not putting your structures in disk-endian before putting them on-disk > seems silly. As far as I can tell if you switch endianness of the > machine accessing your xfs filesystem and have to do a log recover > it won't work because a lot of the log entries will appear corrupted. > > It also seems silly to require your in-memory structure to be binary > compatibile with your log when you immediately copy that structure to > another buffer when it comes time to queue a version of it to put into > the log. > > The fact that you sometimes need to allocate memory and make a copy so > you can stuff your data into the logvec whose only purpose is to then > copy the data a second time seems silly and wasteful. > > Logical logging itself seems reasonable. I just find the implementation > in xfs odd. > > It looks like with a few little changes xfs could retain backwards > compatibility with today, remove extra memory copies, and completely > decouple the format of the in-core structures with the format of the > on-disk structures. Allowing scary comments to be removed. If you think removing the copies is that easy, go right ahead - I'd love to see patches that checkpoint changes directly from the in-memory objects to the log without deadlocking.... Decoupling the in-memory structure from the log format could be done at any time. But it's just not something that is needed, and for the rare cases where it is needed it's better to put the format detection and conversion code into log recovery. i.e. take the conversion penalty once when needed on the slow path rather than on every operation through the fast path.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html