Hi Ritesh, Thanks for bringing this up again. I think we should really document this in the code. Let me first try to explain what we are doing and then why we are doing it this way. During fast commit replay (even for "add range" / "delete range" tags), we always clear the bitmaps of whatever blocks we touch. For example, even if new blocks are added to an inode, during the replay phase of add range tag, we will still clear the bitmap to mark these new blocks as free and at the end of the replay (i.e. after all the tags have been replayed) we will traverse the "modified inodes list" and mark all the blocks in these inodes as allocated. We extend this logic to the replay of the "inode" tag as well. So here's the high level flow of replay path wrt block bitmap handling: - For every "add_range" / "del_range" tags in the FC log area: - Clear block bitmaps for all modified blocks in this tag. - For every "inode" tag in the FC log area: - Clear all the blocks in that inode - Record that the inode is modified - At the end of the replay phase - For each inode in the modified inode list: - Mark all the blocks in this inode as used Let me try to describe the reason why we take this approach with an example. Let's say there was an inode A of size 100 with following extents: Logical - Physical [0-49] - unmapped [50-99] - [1000 - 1049] Let's say this inode changed to following mappings: Logical - Physical [0-49] - [1000 - 1049] [50-99] - [1050 - 1099] Let's say a fast commit was performed at this point. So, this would translate to following logs in the fast commit area: [ADD_RANGE, A, 0-49, 1000-1049] [ADD_RANGE, A, 50-99, 1050-1099] After replaying the first tag, the inode's intermediate state would be as follows: Logical - Physical [0-49] - [1000-1049] [50-99] - [1000-1049] During replay of the second tag, the logical to physical mapping of 50:99 has changed. So, we need to decide what to do with the old blocks. If we mark the old physical blocks corresponding to 50-99 range as free on-disk, then that would result in block bitmap inconsistency, since these blocks are actually being used somewhere else. We can't simply leave these blocks as allocated either since these blocks may actually not be used at all. Things get even more complicated if these blocks are moved to a different file. So to protect against such cases, what we simply do is that we defer setting up the block bitmap correctly until the very end and just mark all the blocks that we encounter during replays of different tags as free. This was done mainly to protect against such cases in add range / delete range operations, but I extended it to replay inode as well just to be on the safer side. generic/311 provides a lot of good test cases for such manipulations. We can see if removing clear_bb from replay inode handling introduces any regressions. Does this make sense? - Harshad On Sun, 20 Feb 2022 at 23:59, Ritesh Harjani <riteshh@xxxxxxxxxxxxx> wrote: > > Hello Ted/Harshad, > > I think we did discuss this once before, but I am unable to recollect the exact > reasoning for this. So question is - why do we have to call ext4_ext_clear_bb() > from ext4_fc_replay_inode()? > > I was just thinking if this is suboptimal and if it can be optimized. But before > working on that problem, I wanted to again understand the right reasoning behind > choosing this approach in the first place. > > Could you please help with this again? > > ext4_fc_replay_inode() > <..> > inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); > if (!IS_ERR(inode)) { > ext4_ext_clear_bb(inode); > iput(inode); > } > <..> > > I will document it this time, so that I don't have to keep coming to this > everytime I look into fc replay code. > > Thanks again for your help!! > -ritesh