On Tuesday, January 01, 2013 10:58:35 PM Shentino wrote: > From what I can tell on the design, tux3 is "fsync satiating" with a > single disk write. It writes the data to the final location, updates > the log, and at that point the data is considered committed and it can > let userspace go on its merry way and take care of rolling up the > changes later. Yes, correct. I think we currently sync a small file create+write with seven blocks and a file rewrite with four blocks, including the commit block and only one long seek. We haven't benchmarked that yet, but it sounds fast. There are two synchronous waits in the backend, but the frontend only waits on the commit block completion in the task doing the sync while other concurrent filesystem operations just keep going. > If I understand btrfs correctly though it has to block > until the cow logic percolates all the way up to the superblock. A careful reading of the Btrfs design doc left me confused about that. Perhaps Btrfs devs could clarify? > One other thing that interests me is this "page forking" that allows > userspace to write to a page that's already busy being written to > disk. From what I heard it bypasses a stall caused by userspace I/O > hitting a locked page. Page forking is an amazing thing and should really head into core, after being thoroughly proved out of course. > Finally, atime handling. I personally dislike the forced default of > "relatime" for mount options and anything that can let atime updates > happen without being a bottleneck is a plus for me. Atime is an odious invention indeed from a developer's perspective, but apparently well loved by some users and has real applications. Knowing which videos you watched recently apparently being one of them. We have a pretty good plan for it that is actually just a small development item, the main feature of which is avoiding polluting the inode table btree, which would cause a lot of churn and aggravate allocate-on-write issues that are already difficult, plus be horribly unfriendly to flash. Instead, we churn a dedicated btree array (actually a regular file) where the write-on-reads are densely concentrated. It somehow feels good to quarantine this craziness at least. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html