RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting. Thanks for the detailed break-down! I don't mind the workaround of using 4k "soft" block size on the filesystem, even for smaller filesystems. Now that I understand better, I think you were on target with your earlier explanation of bd_set_size(). So this means it's not an ext4 bug. I think the online resize of loopback device (or any other block device driver) should use something like the code in check_disk_size_change() instead of bd_set_size(). I will have to test this out. Thanks again.

Regards,
- Jamie

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@xxxxxxx] 
Sent: Wednesday, September 23, 2015 11:14 AM
To: Pocas, Jamie
Cc: Eric Sandeen; linux-ext4@xxxxxxxxxxxxxxx
Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes

On Wed, Sep 23, 2015 at 12:20:17AM -0400, Pocas, Jamie wrote:
> Ted, just to add another data point, with some minor adjustments to 
> the script to use xfs instead, such as using "mkfs.xfs -b size=1024"
> to force 1k blocks, I cannot reproduce the issue and the data block 
> size doesn't change from 1k.

Yes, that's not surprising, because XFS doesn't use the buffer cache layer.  Ext4 does, because that's the basis of how the jbd2 layer works.  It does change the block size as reported by the block device and which is used by the buffer cache layer, though.  (Internally, this is known as the "soft" block size; it's basically the data in which data is cached in the buffer cache layer):

root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mkfs.xfs -b size=1024 /tmp/foo.img
meta-data=/tmp/foo.img           isize=512    agcount=4, agsize=25600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=1024   blocks=102400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=1024   blocks=2573, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
4096 <--------- BUG, note the change in the block size root@kvm-xfstests:~# touch /mnt/foo root@kvm-xfstests:~# sync
<------ The reason why we don't hang is that XFS doesn't use the
<------ buffer cache
root@kvm-xfstests:~# umount /mnt



Also feel free to try my repro, but using "blockdev --getbsz /dev/loop" before and after the losetup -c command, and note that it does not hang even though there is no resize2fs in the command sequence at all:

root@kvm-xfstests:~# cp /dev/null /tmp/foo.img root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mke2fs -t ext4 /tmp/foo.img mke2fs 1.43-WIP (18-May-2015)
Discarding device blocks: done                            
Creating filesystem with 102400 1k blocks and 25688 inodes Filesystem UUID: 27dfdbbe-f3a9-48a7-abe8-5a52798a9849
Superblock backups stored on blocks: 
	8193, 24577, 40961, 57345, 73729

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done 

root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
4096 <------------ BUG
root@kvm-xfstests:~# touch /mnt/foo
<------- Should hang here, even though there is no resize2fs command
<------- If it doesn't hang right away, try typing the "sync" command


> Suffer this small analogy
> for me and let me know where I am wrong: say hypothetically I expand a 
> small partition (or LVM for that matter). Then I try to use resize2fs 
> to grow the ext filesystem on it. I expect that this should *not* 
> change the block size of the underlying device (of course not!) nor 
> the filesystem's block size.

The cause of your misunderstanding is not understanding that there are actually 4 different concepts of block/sector size:

* The logical block/sector size of the underlying storage device
	- Retrived via "blockdev --getss /dev/sdXX"
	- This is the smallest unit that can be sent to the disk from
	  the Host OS.  If the logical sector size is different from
	  the physical block size, and write is smaller than the
	  physical sector size (see below), then the disk will do a
	  read-modify-write.
	- The file system block size MUST be greater than or equal to
	  the logical sector size.

* The physical block/sector size of the underlying storage device
	- Retrived via "blockdev --getpbsz /dev/sdXX"
	- This is the smallest unit can be physically written to the
	  storage media.
  	- The file system block size SHOULD be greater than or equal
	  to the logical sector size.  (To avoid read-modify-write
	  operations by the hard drive that will bad for performance.)

* The "soft" block size of the block device.
	- Retrived via "blockdev --getbsz /dev/sdXX"
	- This represents the units of storage which is used to cache
	  data in the buffer cache.  This only matters if you are
	  using buffer cache --- for example, if you are doing
	  buffered I/O to a block device, or if you are using a file
	  system such as ext4 which is using buffer cache.  Since data
	  is indexed in the buffer cache by the 3-tuple (block device,
	  block number, block size), Bad Things happen if you try to
	  change the block size while the file system is mounted.
	  Normally, the kernel will prevent you from changing the
	  block size under these circumstances.

* The file system block size.
	- Retrieved by some file-system dependent command.  For ext4,
	  this is "dumpe2fs -h".
	- Set at format time.  For file systems that use the buffer
	  cache, the file system driver will automatically set the
	  "soft" block size of the block device when the file system
	  is mounted.


Speaking of LVM, I can't reproduce the problem using LVM, at least not with a 4.3-rc2 kernel:

root@kvm-xfstests:~# pvcreate /dev/vdc
  Physical volume "/dev/vdc" successfully created root@kvm-xfstests:~# vgcreate test /dev/vdc
  Volume group "test" successfully created root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test
  Logical volume "small" created
root@kvm-xfstests:~# mkfs.ext4 -Fq /dev/test/small root@kvm-xfstests:~# mount -o loop /dev/test/small /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@kvm-xfstests:~# lvresize -L 1G /dev/test/small
  Size of logical volume test/small changed from 100.00 MiB (25 extents) to 1.00 GiB (256 extents).
  Logical volume small successfully resized root@kvm-xfstests:~# blockdev --getbsz /dev/loop0
1024  <------ NO BUG, see the block size has not changed root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test^C root@kvm-xfstests:~# touch /mnt/foo ; sync root@kvm-xfstests:~# resize2fs /dev/test/small resize2fs 1.43-WIP (18-May-2015) Filesystem at /dev/test/small is mounted on /mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 The filesystem on /dev/test/small is now 1048576 (1k) blocks long.
<------ Note that resize2fs works just fine!
root@kvm-xfstests:~# touch /mnt/bar ; sync root@kvm-xfstests:~# umount /mnt root@kvm-xfstests:~# 

You might see if this works on CentOS; but if it doesn't, I'm pretty convinced this is a bug outside of ext4, and I've already given you a workaround --- using "-b 4096" on the command line to mkfs.ext4 or mke2fs.

Alternatively, here's another workaround; you can change modify your /etc/mke2fs.conf so the "small" and "floppy" stanzas read:

[fs_types]
	small = {
		blocksize = 4096
		inode_size = 128
		inode_ratio = 4096
	}
	floppy = {
		blocksize = 4096
		inode_size = 128
		inode_ratio = 8192
	}

I'm pretty certain your failures won't reproduce if you either change how you call mke2fs for small file systems, or change your /etc/mke2fs.conf file as shown above.

Cheers,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux