On Tue, Jul 23, 2019 at 11:03:54PM -0700, harshad shirwadkar wrote: > Before I respond to your questions, I would like to explain how fast > commits differ from ijournal in a few key aspects (I will make sure to > explain it in detail in patch 00/11 and documentation): Please do; I hadn't realized there were also journal ondisk format changes, and these must be recorded in the ext4 disk format documentation. > - Instead of storing extent blocks in a fast commit block, we only > store extents that were modified in a particular fast commit > transaction in tag-length-value format. > > - Whenever the fast commit information (inode structure + changed > extents in TLV format) exceeds one block, we fall back to full commit. > Thus at this point, the number of blocks we write per fast commit > transaction is either the total number of files changed (if fast > commit was successfully performed) or the number of blocks that would > be written by a full commit transaction. > > - To reduce complexity, there is no support for per-core fast commit areas. > > Current design of fast commits is such that we try to perform fast > commits whenever possible but either if it's impossible to record file > system changes by fast commits or if we haven't yet added support for > dealing with a particular type of file system change, we fall back to > full commits. Whenever we later add more features to fast commits, we > probably would need more on-disk format changes for the fast commit > blocks and that would mean we burn feature flags. So, my guess is that > we would need to make a few judgement calls on whether we want to > exclude a few fast commit features, keep the patch series simple and > potentially burn feature flags later OR we save feature flags by > implementing those fast commit features. Every feature flag you add doubles the size of the testing matrix. If I were you I'd only want to test the (fastcommit) and (!fastcommit) scenarios. --D > On Tue, Jul 23, 2019 at 2:59 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > > > On Jul 22, 2019, at 3:02 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > > > > > On Mon, Jul 22, 2019 at 12:15:11PM -0600, Andreas Dilger wrote: > > >> Unless I missed it, this patch series needs a 00/11 email that describes > > >> *what* "fast commit" is, and why we want it. This should include some > > >> benchmark results, since (I'd assume) that the "fast" part of the feature > > >> name implies a performance improvement? > > > > > > For background, it's a simplified version of the scheme proposed by > > > Park and Shin, in their paper, "iJournaling: Fine-Grained Journaling > > > for Improving the Latency of Fsync System Call"[1] > > > > > > [1] https://www.usenix.org/conference/atc17/technical-sessions/presentation/park > > > > > > I agree we should have a cover letter for this patch series. Also, we > > > should add documentation to Documentation/filesystems/journaling.rst > > > about this feature; what it does, how it works, its basic on-disk > > > format changes, etc. > > > > Thanks for the link, I hadn't read that paper previously. From reading the > > paper, it seems there are some things that should be addressed before the > > patch is committed to the tree in order to maintain proper disk format > > compatibility: > > - the ijournal header shows a 256-byte inode. In Lustre today (and also > > Samba and other xattr-intensive workloads) 512- or 1024-byte inodes are used > > in order to store more xattrs within the inode, so the size of the inode > > data in the ijournal header needs to match the actual inode size of the > > filesystem and not be a fixed size. What if the inode size == blocksize? > > Okay, I agree. This is one of such questions where we need to decide > whether to exclude this fast commit feature request for now or not. I > think whether or not we actually support 512 or 1024 byte inodes in > this patch series, we definitely shouldn't assume in the fast commit > header that inodes are of a fixed size. I will fix it. Supporting > bigger inodes doesn't sound like it would result in big change in the > patch series. But I would like to know whether you think it's okay to > wait or not. > > > - the ijournal header also shows a 4-byte inode number. It would be prudent > > to reserve space for 64-bit inode numbers, or at least have some mechanism > > (flag) to indicate that a 64-bit inode is stored instead of a 32-bit inode. > > Noted, will change that. > > > - if there are many cores in a system, say 96, how much space will be used > > from the journal file by the per-core ijournal? > > - what happens if multiple threads are writing to the same file with ijournal > > and per-core ijournal areas? Will the same inode information be recorded > > in multiple ijournal areas? > > As mentioned above, at least for now we keep it simple by not having > per-core fast commit areas. > > Thanks, > Harshad > > > > > Cheers, Andreas > > > > > > > > > >