On Apr 22, 2020, at 6:40 PM, Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > > On Wed, Apr 22, 2020 at 02:15:28PM -0600, Andreas Dilger wrote: >> On Apr 22, 2020, at 10:00 AM, Jan Kara <jack@xxxxxxx> wrote: >>> On Tue 14-04-20 00:02:07, Josh Triplett wrote: >>>> Is there a fundamental reason that ext4 *can't* or *shouldn't* support >>>> inline data with 128-byte inodes? >>> >>> Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes >>> with 'osd2' union... >> >> There are 60 bytes in the "i_block" field that can be used by inline_data. > > Exactly. But the Linux ext4 implementation doesn't accept inline data > unless the system.data xattr exists, even if the file's data fits in 60 > bytes (in which case system.data must exist and have 0 length). > >>> Or do you mean we should put inline data in an external xattr block? > > Definitely not, no. That seems much more complex to deal with. > > I'm only talking about the case of files or directories <= 60 bytes > fitting in the inode with 128-byte inodes. Effectively, this would mean > accepting inodes with the inline data flag set, whether they have the > system.data xattr or not. > >> Maybe there is a bigger win for small directories avoiding 4KB leaf blocks? >> >> That said, I'd be happy to see some numbers to show this is a win, and >> I'm definitely not _against_ allowing this to work if there is a use for it. > > Some statistics, for ext4 with 4k blocks and 128-byte inodes, if 60-byte > inline data worked with 128-byte inodes: > > A filesystem containing the source code of the Linux kernel would > save about 1508 disk blocks, or around 6032k. > > A filesystem containing only my /etc directory would save about 650 > blocks, or 2600k, a substantial fraction of the entire directory (which > takes up 9004k total without inline data). Hi Josh, I started looking into this, and got half-way through a patch before getting pulled into something else. If you are interested to finish it off, I've included it here, but it definitely isn't ready yet. Looking at the details, it isn't just an issue of changing a single threshold to decide whether 128-byte inodes can handle inline data or not. The code is intermingled between "have system.data xattr" as the defining factor and using "inline_off" (which points to the byte offset of the system.data contents) to determine whether there is inline data or not. It _should_ be possible to change to using EXT4_STATE_MAY_INLINE_DATA and EXT4_INLINE_DATA_FL to determine if there is any inline data in the inode, and use "inline_off" and i_extra_isize only to detect if any data is in the xattr or not. It doesn't need to check the xattrs if the inode size is EXT4_GOOD_OLD_INODE_SIZE, since we *don't* want "inline" data in an external xattr block for sanity/efficiency reasons. Hopefully this patch is useful for you as a starting point - I suspect it is about 80% finished, but I wouldn't have time to work on it for a few weeks at least, since 128-byte inodes is not a useful case for me (we are using 1024-byte inodes on our metadata filesystems these days). On the flip side, I think inline_data is very interesting, and am happy for it to gain more widespread usage. I don't think the 128-byte inode support for inline_data increases complexity, so I wouldn't be against landing it. Cheers, Andreas -- diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 61b37a0..ec6f51a 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3260,8 +3260,7 @@ extern int ext4_inline_data_fiemap(struct inode *inode, static inline int ext4_has_inline_data(struct inode *inode) { - return ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA) && - EXT4_I(inode)->i_inline_off; + return ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA); } /* namei.c */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index fad82d0..ea9596a 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -20,7 +20,7 @@ static int ext4_get_inline_size(struct inode *inode) { - if (EXT4_I(inode)->i_inline_off) + if (ext4_get_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) return EXT4_I(inode)->i_inline_size; return 0; @@ -94,7 +94,7 @@ int ext4_get_max_inline_size(struct inode *inode) struct ext4_iloc iloc; if (EXT4_I(inode)->i_extra_isize == 0) - return 0; + return EXT4_MIN_INLINE_DATA_SIZE; error = ext4_get_inode_loc(inode, &iloc); if (error) { @@ -133,6 +133,11 @@ int ext4_find_inline_data_nolock(struct inode *inode) }; int error; + if (!ext4_has_feature_inline_data(inode)) + return 0; + + ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); + EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE; if (EXT4_I(inode)->i_extra_isize == 0) return 0; @@ -153,9 +158,8 @@ int ext4_find_inline_data_nolock(struct inode *inode) } EXT4_I(inode)->i_inline_off = (u16)((void *)is.s.here - (void *)ext4_raw_inode(&is.iloc)); - EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE + + EXT4_I(inode)->i_inline_size += le32_to_cpu(is.s.here->e_value_size); - ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); } out: brelse(is.iloc.bh); @@ -188,6 +192,7 @@ static int ext4_read_inline_data(struct inode *inode, void *buffer, if (!len) goto out; + BUG_ON(EXT4_I(inode)->i_extra_isize == 0); header = IHDR(inode, raw_inode); entry = (struct ext4_xattr_entry *)((void *)raw_inode + EXT4_I(inode)->i_inline_off); @@ -219,7 +224,7 @@ static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc, if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return; - BUG_ON(!EXT4_I(inode)->i_inline_off); + BUG_ON(!ext4_get_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)); BUG_ON(pos + len > EXT4_I(inode)->i_inline_size); raw_inode = ext4_raw_inode(iloc); @@ -238,6 +243,7 @@ static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc, if (!len) return; + BUG_ON(EXT4_I(inode)->i_extra_isize == 0); pos -= EXT4_MIN_INLINE_DATA_SIZE; header = IHDR(inode, raw_inode); entry = (struct ext4_xattr_entry *)((void *)raw_inode + @@ -269,6 +275,17 @@ static int ext4_create_inline_data(handle_t *handle, if (error) goto out; + EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE; + memset((void *)ext4_raw_inode(&is.iloc)->i_block, + 0, EXT4_MIN_INLINE_DATA_SIZE); + ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); + ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); + + if (EXT4_I(inode)->i_extra_isize == 0) { + BUG_ON(len > EXT4_MIN_INLINE_DATA_SIZE); + goto out_dirty; + } + if (len > EXT4_MIN_INLINE_DATA_SIZE) { value = EXT4_ZERO_XATTR_VALUE; len -= EXT4_MIN_INLINE_DATA_SIZE; @@ -295,14 +312,11 @@ static int ext4_create_inline_data(handle_t *handle, goto out; } - memset((void *)ext4_raw_inode(&is.iloc)->i_block, - 0, EXT4_MIN_INLINE_DATA_SIZE); - EXT4_I(inode)->i_inline_off = (u16)((void *)is.s.here - (void *)ext4_raw_inode(&is.iloc)); - EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE; - ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); - ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); + EXT4_I(inode)->i_inline_size += len; + +out_dirty: get_bh(is.iloc.bh); error = ext4_mark_iloc_dirty(handle, inode, &is.iloc); @@ -804,6 +818,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, void **fsdata) { int ret = 0, inline_size; + bool truncate = false; struct page *page; page = grab_cache_page_write_begin(mapping, 0, flags); @@ -811,10 +826,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, return -ENOMEM; down_read(&EXT4_I(inode)->xattr_sem); - if (!ext4_has_inline_data(inode)) { - ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); - goto out; - } + if (!ext4_has_inline_data(inode)) + goto out_clear; inline_size = ext4_get_inline_size(inode); @@ -827,23 +840,25 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, ret = __block_write_begin(page, 0, inline_size, ext4_da_get_block_prep); if (ret) { - up_read(&EXT4_I(inode)->xattr_sem); - unlock_page(page); - put_page(page); - ext4_truncate_failed_write(inode); - return ret; + truncate = true; + goto out; } SetPageDirty(page); SetPageUptodate(page); - ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); *fsdata = (void *)CONVERT_INLINE_DATA; +out_clear: + ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA); + ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); out: up_read(&EXT4_I(inode)->xattr_sem); if (page) { + if (read) unlock_page(page); put_page(page); + if (truncate) + ext4_truncate_failed_write(inode); } return ret; }
Attachment:
signature.asc
Description: Message signed with OpenPGP