Re: ext4 shows file system is 4G, df says 3G

"Theodore Ts'o" <tytso@xxxxxxx> · Mon, 3 Jun 2019 23:06:25 -0400

On Mon, Jun 03, 2019 at 10:30:24AM -0700, Ross Boylan wrote:
> 
> I can imagine that the metadata structures for 4TB ended up eating a
> huge fraction of the space after the resize but a) it seems quite a
> coincidence that would lead to exactly the size in step 3 and b) I
> don't see it in any of the reported info, e.g., reserved blocks,
> except for the fact that the blocks available is quite low given the
> size of the files on the file system.  Actually, maybe I do see it:
> the journal size is 1024M = 1G (if the units are bytes; if the units
> are blocks then the journal would be bigger than the whole filesystem)
> so that could account for the difference.

Yes, that's correct.  The journal size is most of the difference.
Mke2fs uses a hueristic to decide how big to make the journal, with
the maximum size being a gigabyte.  This is mainly because people
don't mind an overhead of 0.1% for the journal -- but they do mind an
overhead of 25%.

(In fact, with older versions of e2fsprogs, we used a max journal size
of 128M, but a graduate student who was working with me on an
unrelated file system feature, when we were trying to do benchmarks to
demonstrate the benefits of the feature he was working on, we
discovered that for certain metadata heavy workloads, it's possible
for the journal checkpointing overhead to be the bottleneck.
Increasing the journal to 1G avoids this --- although with a
sufficiently fast storage device, it's possible that the journal could
become the bottleneck again.

The simplest way to reset the journal for the default size is to
unmount the file system, then remove the journal, and then recreate it:

# mke2fs -t ext4 -q /tmp/foo.img 4T
# resize2fs /tmp/foo.img 4G
resize2fs 1.45.2 (27-May-2019)
Resizing the filesystem on /tmp/foo.img to 1048576 (4k) blocks.
The filesystem on /tmp/foo.img is now 1048576 (4k) blocks long.

# dumpe2fs -h /tmp/foo.img | grep "Journal size"
dumpe2fs 1.45.2 (27-May-2019)
Journal size:             1024M

# tune2fs -O ^has_journal /tmp/foo.img
tune2fs 1.45.2 (27-May-2019)
# tune2fs -O has_journal /tmp/foo.img
tune2fs 1.45.2 (27-May-2019)
Creating journal inode: done

# dumpe2fs -h /tmp/foo.img | grep "Journal size"
dumpe2fs 1.45.2 (27-May-2019)
Journal size:             64M

(There are some other differences; the number of reserved gdt blocks,
used for online resizing, will be larger if you start with a file
system which is 4T and the resize it down to 4G.  But that's only make
a difference of about 16M for the 4G file system.  There would also be
fewer inodes in the resulting file system.  But the big difference is
size of the journal.)

						- Ted