On Thu 12-08-21 11:01:34, Theodore Ts'o wrote: > On Wed, Aug 11, 2021 at 12:19:13PM +0200, Jan Kara wrote: > > +static int ext4_orphan_file_del(handle_t *handle, struct inode *inode) > > +{ > > + struct ext4_orphan_info *oi = &EXT4_SB(inode->i_sb)->s_orphan_info; > > + __le32 *bdata; > > + int blk, off; > > + int inodes_per_ob = ext4_inodes_per_orphan_block(inode->i_sb); > > + int ret = 0; > > + > > + if (!handle) > > + goto out; > > + blk = EXT4_I(inode)->i_orphan_idx / inodes_per_ob; > > + off = EXT4_I(inode)->i_orphan_idx % inodes_per_ob; > > + if (WARN_ON_ONCE(blk >= oi->of_blocks)) > > + goto out; > > + > > + ret = ext4_journal_get_write_access(handle, inode->i_sb, > > + oi->of_binfo[blk].ob_bh, EXT4_JTR_ORPHAN_FILE); > > + if (ret) > > + goto out; > > If ext4_journal_get_write_access() fails, we effectively drop the > inode from the orphan list (as far as the in-memory inode is > concerned), although the inode will still be listed in the orphan > file. This can be really unfortunate since if the inode gets > reallocated for some other purpose, since its inode number is left in > the orphan block, on the next remount, this could lead to data loss. > > In the orphan list code, we leave the inode on the linked list, which > is not great, since that will prevent the inode from being freed, but > at least we're keeping the in-memory and on-disk state in sync and we > avoid the data loss scenario when the inode gets reused. Actually, in the orphan list code, we leave the inode in the on-disk list but remove it from the in-memory list - see how list_del_init(&ei->i_orphan) is called very early in ext4_orphan_del(). The reason for this unconditional deletion is that if we do not remove the inode from the in-memory orphan list, the filesystem will complain and corrupt memory on unmount. Also note that leaving inode in the on-disk orphan list actually does no serious harm. Because the orphan cleanup code just checks i_nlink and i_disksize and truncates inode down to current i_disksize and removes inode completely if i_nlink is 0. So even if an inode on the orphan list gets reused, orphan cleanup will just do nothing for it. So the worst problem that will likely happen is that on-disk orphan linked list becomes corrupted but there's no data loss AFAICT. Is it clearer now or am I missing something? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR