On Tue, Feb 4, 2025 at 9:48 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > On Tue, Feb 4, 2025 at 9:31 PM Theodore Ts'o <tytso@xxxxxxx> wrote: > > > > On Tue, Feb 04, 2025 at 05:49:48PM +0100, Mateusz Guzik wrote: > > > I'm going to restate: the original behavior can be restored by > > > replacing i_size usage with a strlen call. However, as is I have no > > > basis to think that the disparity between the two is legitimate. If an > > > ext4 person (Ted cc'ed) tells me the disparity can legally happen and > > > is the expected way, I'm going to patch it up no problem. > > > > There are two kinds of symlinks in ext4. "Fast symlinks", where the > > symlink target can fit in i_block[]. And normal, sometimes called > > "slow" symlinks, where i_block[] contains a pointer to data block > > (either i_blocks[0] for a traditional inode, or the root of an extent > > tree that will fit in i_block[] if EXTENTS_FL is set). In the case > > of a fast symlink, the maximum size of the file system is > > sizeof(i_block[])-1. For a normal/slow symlink, the maximum size is > > super->s_blocksize. > > > > We determine whether a symlink is "fast" or not by checking whether > > i_size is less than sizeof(i_blocks[]). In other words. if i_size > > less than 60 (and it's not an encrypted inode), it's a fast symlink > > and we set i_link to point to the i_block array. From __ext4_iget() > > in fs/ext4/inode.c: > > > > if (IS_ENCRYPTED(inode)) { > > inode->i_op = &ext4_encrypted_symlink_inode_operations; > > } else if (ext4_inode_is_fast_symlink(inode)) { > > inode->i_link = (char *)ei->i_data; > > inode->i_op = &ext4_fast_symlink_inode_operations; > > nd_terminate_link(ei->i_data, inode->i_size, > > sizeof(ei->i_data) - 1); > > } else { > > inode->i_op = &ext4_symlink_inode_operations; > > } > > > > > > The call to nd_terminate_link() guarantees that inode->i_link is > > NUL-terminated, although the on-disk formatshould have already been > > NUL-terminated when the symlink is set. > > > > It is nul-terminated, but inode->i_size does not line up with it -- as > in the inode size pulled from the disk image is different than what > strlen would suggest. > > My question is if that's legitimate, I'm guessing not. If not, then > ext4 should complain about it. > > On stock kernel this happens to work because strlen finds the "right" size. > So it occurred to me to check what fsck thinks about it. I ran it twice in a row, it *removed* the problematic symlink. e2fsck 1.47.0 (5-Feb-2023) One or more block group descriptor checksums are invalid. Fix<y>? yes Group descriptor 0 checksum is 0xd7c5, should be 0xa2fe. FIXED. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /file0/file1 (inode #14) is invalid. Clear<y>? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information [ERROR] ../../../../lib/support/quotaio_tree.c:546:check_reference: Illegal reference (4294967295 >= 6) in user quota file Update quota info for quota type 0<y>? yes [QUOTA WARNING] Usage inconsistent for ID 4294967295:actual (0, 0) != expected (458752, 8) [QUOTA WARNING] Usage inconsistent for ID 0:actual (458752, 8) != expected (0, 0) [ERROR] ../../../../lib/support/quotaio_tree.c:546:check_reference: Illegal reference (256 >= 6) in group quota file Update quota info for quota type 1<y>? yes syzkaller: ***** FILE SYSTEM WAS MODIFIED ***** syzkaller: 16/32 files (0.0% non-contiguous), 160/512 blocks So I stand by my assessment that the symlink fetching code should check for the problem of size vs strlen disparity. > > Also note that there really is no point to caching inodes where > > inode->i_link is not a NUL pointer, since in those cases we never call > > file system's get_link() function; the VFS will just fetch it straight > > from i_link. And in those cases, it's because it's a "fast symlink" > > (many file systems have this concept; not just ext[234]) and the > > symlink target was read in as part of the on-disk inode, and so there > > is no separate disk block read read the symlink. And so if you are > > attempting to cache symlinks where inode->i_link is non-NULL, you're > > just wasting a small amount of memory, and wasting the CPU time to do > > the strcpy. > > > > My code does not allocate any extra memory or do any extra copies. > > The code prior to my change would grab i_link, strlen on it and pass > that to copy_to_user every single time readlink is issued. > > My code stores the size in the inode (unioned with a field not used > for symlinks, so no again no increase in memory usage) so that the > strlen call can be avoided. Past that it's the same thing. > > -- > Mateusz Guzik <mjguzik gmail.com> -- Mateusz Guzik <mjguzik gmail.com>