On Mar 16, 2008 07:05 +0800, Andreas Dilger wrote: > On Mar 15, 2008 19:07 +0300, Dmitri Monakhov wrote: > > I've found what ext3_setattr() code has some strange logic. I'm talking > > about truncate path. > > > > int ext3_setattr(struct dentry *dentry, struct iattr *attr) > > { > > ... > > if (S_ISREG(inode->i_mode) && > > attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { > > handle_t *handle; > > <<< This is shrinking case, and according to function comments: > > <<< "In particular, we want to make sure that when the VFS > > <<< * shrinks i_size, we put the inode on the orphan list and modify > > <<< * i_disksize immediately" > > <<< we about to write i_disksize. But WHY do we have to do it explicitly? > > <<< Later inode_setattr() will call ext3_truncate() which will do it > > <<< this work for us. > > The reason that i_disksize is written to disk here immediately is that the > journal is stopped. Once that is done then in case of a crash the orphan > recovery code will detect the unfinished truncate and complete it before > mounting the filesystem. > > > rc = inode_setattr(inode, attr); > > <<< Now the most interesting question. What we have to do now in > > <<< case of error? We are in tricky situation. Truncate not happened, > > <<< and blocks visible to the user, but i_disksize was already written, > > <<< so later memory reclaiming/ read_inode will result in unexpected > > <<< updating i_size. > > The only ways inode_setattr() can fail are: > - expanding vmtruncate hits EFBIG, but we checked that above > - shrinking vmtruncate on a swapfile returns ETXTBUSY. This was added > after the ext3_setattr() code was written. I forgot to add that we need to handle the IS_SWAPFILE() case properly. Granted, it probably isn't a very common problem, but the IS_SWAPFILE() check was added explicitly because of clueless users. Also, very few users run with a swapfile in any case, most use a swap partition. It would seem that if you have a swapfile, try to truncate it to 0 (which will fail with -ETXTBUSY) and then unmount the filesystem the size will be truncated to 0: [root@lin-cli1 tests]# dd if=/dev/zero of=/tmp/foo bs=4k count=100 100+0 records in 100+0 records out [root@lin-cli1 tests]# mkswap /tmp/foo Setting up swapspace version 1, size = 405 kB [root@lin-cli1 tests]# swapon /tmp/foo [root@lin-cli1 tests]# swapon -s Filename Type Size Used Priority /dev/hda5 partition 1052216 136 -1 /tmp/foo file 392 0 -2 [root@lin-cli1 tests]# dd if=/dev/zero of=/tmp/foo bs=4k count=1 seek=99 dd: advancing past 405504 bytes in output file `/tmp/foo': Text file busy [root@lin-cli1 tests]# > /tmp/foo bash: /tmp/foo: Text file busy [root@lin-cli1 tests]# ls -l /tmp/foo 404 -rw-r--r-- 1 root root 409600 Mar 15 17:12 /tmp/foo [root@lin-cli1 tests]# swapoff /tmp/foo [root@lin-cli1 tests]# df /tmp Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1 20641788 8997964 10595184 46% / [root@lin-cli1 tests]# debugfs /dev/hda1 debugfs 1.40.2.cfs4 (12-Jul-2007) debugfs: stat /tmp/foo Inode: 1346639 Type: regular Mode: 0644 Flags: 0x0 Generation: 2276953 732 User: 0 Group: 0 Size: 0 <<< *** oops *** File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 808 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x47dc57fb -- Sat Mar 15 17:12:59 2008 atime: 0x47dc57fb -- Sat Mar 15 17:12:59 2008 mtime: 0x47dc57fb -- Sat Mar 15 17:12:59 2008 BLOCKS: (0):2689410, (1-11):2689460-2689470, (IND):2689471, (12-54):2689472-2689514, (55 -59):2689516-2689520, (60-63):2689532-2689535, (64-87):2689544-2689567, (88-96): 2689592-2689600, (97-99):2689619-2689621 TOTAL: 101 debugfs: quit [root@lin-cli1 tests]# e2fsck -fn /dev/hda1 e2fsck 1.40.2.cfs4 (12-Jul-2007) Warning! /dev/hda1 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. Pass 1: Checking inodes, blocks, and sizes Inode 1346639, i_size is 0, should be 409600. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /: ********** WARNING: Filesystem still has errors ********** /: 337862/2626560 files (6.9% non-contiguous), 2354290/5242880 blocks Until e2fsck was actually run on the filesystem (which would do the right thing and fix the file size) the swap file would have 0 length and fail to start. It probably isn't fatal for most systems to run without swap these days, but let's see if this would cause a boot-time failure or if it is silently ignored (on a RHEL4 system): [root@lin-cli1 tests]# > /tmp/foo [root@lin-cli1 tests]# debugfs /dev/hda1 debugfs 1.40.2.cfs4 (12-Jul-2007) debugfs: stat /tmp/foo Inode: 1346639 Type: regular Mode: 0644 Flags: 0x0 Generation: 2276953 732 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x47dc5a45 -- Sat Mar 15 17:22:45 2008 atime: 0x47dc57fb -- Sat Mar 15 17:12:59 2008 mtime: 0x47dc5a45 -- Sat Mar 15 17:22:45 2008 BLOCKS: debugfs: quit [root@lin-cli1 tests]# swapon /tmp/foo swapon: /tmp/foo: Invalid argument [root@lin-cli1 tests]# echo $? 255 [root@lin-cli1 tests]# grep -C 3 swapon /etc/rc.sysinit # Start up swapping. update_boot_stage RCswap action $"Enabling swap space: " swapon -a -e # Set up binfmt_misc /bin/mount -t binfmt_misc none /proc/sys/fs/binfmt_misc > /dev/null 2>&1 So it appears the error would be ignored, and eventually (because e2fsck will run periodically on filesystems, unless the user turns this off) it would be repaired properly. Still, it should be fixed. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html