On 2022/4/7 21:55, Jan Kara wrote: > On Thu 07-04-22 16:14:24, Zhang Yi wrote: >> On 2022/4/7 1:17, Jan Kara wrote: >>> On Wed 06-04-22 16:45:03, Zhang Yi wrote: >>>> Symlink's external data block is one kind of metadata block, and now >>>> that almost all ext4 metadata block's page cache (e.g. directory blocks, >>>> quota blocks...) belongs to bdev backing inode except the symlink. It >>>> is essentially worked in data=journal mode like other regular file's >>>> data block because probably in order to make it simple for generic VFS >>>> code handling symlinks or some other historical reasons, but the logic >>>> of creating external data block in ext4_symlink() is complicated. and it >>>> also make things confused if user do not want to let the filesystem >>>> worked in data=journal mode. This patch convert the final exceptional >>>> case and make things clean, move the mapping of the symlink's external >>>> data block to bdev like any other metadata block does. >>>> >>>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> >>>> --- >>>> This RFC patch follow the talking of whether if we could unify the >>>> journal mode of ext4 metadata blocks[1], it stop using the data=journal >>>> mode for the final exception case of symlink's external data block. Any >>>> comments are welcome, thanks. >>>> >>>> [1]. https://lore.kernel.org/linux-ext4/20220321151141.hypnhr6o4vng2sa6@xxxxxxxxxx/T/#m84b942a6bb838ba60ae8afd906ebbb987a577488 >>>> >>>> fs/ext4/inode.c | 9 +--- >>>> fs/ext4/namei.c | 123 +++++++++++++++++++++------------------------- >>>> fs/ext4/symlink.c | 44 ++++++++++++++--- >>>> 3 files changed, 93 insertions(+), 83 deletions(-) >>> >>> Hum, we don't save on code but I'd say the result is somewhat more >>> standard. So I guess this makes some sense. Let's see what Ted thinks... >>> >>> Otherwise I've found just one small bug below. >>> >>>> @@ -3270,26 +3296,8 @@ static int ext4_symlink(struct user_namespace *mnt_userns, struct inode *dir, >>>> if (err) >>>> return err; >>>> >>>> - if ((disk_link.len > EXT4_N_BLOCKS * 4)) { >>>> - /* >>>> - * For non-fast symlinks, we just allocate inode and put it on >>>> - * orphan list in the first transaction => we need bitmap, >>>> - * group descriptor, sb, inode block, quota blocks, and >>>> - * possibly selinux xattr blocks. >>>> - */ >>>> - credits = 4 + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) + >>>> - EXT4_XATTR_TRANS_BLOCKS; >>>> - } else { >>>> - /* >>>> - * Fast symlink. We have to add entry to directory >>>> - * (EXT4_DATA_TRANS_BLOCKS + EXT4_INDEX_EXTRA_TRANS_BLOCKS), >>>> - * allocate new inode (bitmap, group descriptor, inode block, >>>> - * quota blocks, sb is already counted in previous macros). >>>> - */ >>>> - credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + >>>> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3; >>>> - } >>>> - >>>> + credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + >>>> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3; >>> >>> This does not seem like enough credits - we may need to allocate inode, add >>> entry to directory, allocate & initialize symlink block. So I think you >>> need to add 4 for block allocation + init in case of non-fast symlink. And >>> please keep the comment explaining what is actually counted in the number >>> of credits... >>> >> >> Thanks for pointing this out, and ext4_mkdir() seems has the same problem >> too because we also need to allocate one more block to store '.' and '..' >> entries for a new created empty directory. > > OK, I was thinking a bit more about this and the comment was actually a bit > misleading AFAICT. So we have: > > EXT4_INDEX_EXTRA_TRANS_BLOCKS for addition of entry into the directory. > +3 for inode, inode bitmap, group descriptor allocation > EXT4_DATA_TRANS_BLOCKS for the data block allocation and modification. > > So things actually look OK, just the comment was wrong and in the old code > the credits were overestimated (because we've allocated the data block in a > separate transaction). > Yes,I will update the comments in my v2 iteration. >> BTW, look the credits calculation in depth, the definition of >> EXT4_DATA_TRANS_BLOCKS is weird, the '-2' subtraction looks wrong. >> >>> #define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \ >>> EXT4_XATTR_TRANS_BLOCKS - 2 + \ >>> EXT4_MAXQUOTAS_TRANS_BLOCKS(sb)) >> >> I see the history log, before commit[1], the '-2' subtract the 2 more duplicate >> counted super block in '3 * EXT3_SINGLEDATA_TRANS_BLOCKS', but after this commit, >> it seems buggy because we have only count the super block once. It's a long time >> ago, I'm not sure am I missing something? > > Yes, -2 looks strange but at the same time I fail to see why > EXT4_XATTR_TRANS_BLOCKS would need to be accounted for normal data > operation and why we're reserving 6 blocks there. I'll raise it on today's > ext4 call if someone remembers... > I guess the 6 blocks were: 1. Ref count update on old xattr block 2. new xattr block 3. block bitmap update for new xattr block 4. group descriptor for new xattr block 5. block bitmap update for old xattr block 6. group descriptor for old block The 5 and 6 looks like were overestimated in cases of 1) we just update the old ref count to no zero, 2) we free the old xattr block and the credits has already counted in the default revoke credits. Thanks, Yi. >> [1]. https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=2df2c24aa6d2cd56777570d96494b921568b4405