Re: Insane file system overhead on large volume

Eric Sandeen <sandeen@xxxxxxxxxxx> · Sat, 28 Jan 2012 09:35:14 -0600

On 1/28/12 8:55 AM, Martin Steigerwald wrote:
> Am Freitag, 27. Januar 2012 schrieb Eric Sandeen:
>> On 1/27/12 1:50 AM, Manny wrote:
>>> Hi there,
>>>
>>> I'm not sure if this is intended behavior, but I was a bit stumped
>>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
>>> 6) with XFS and noticed that there were only 22 TB left. I just
>>> called mkfs.xfs with default parameters - except for swith and sunit
>>> which match the RAID setup.
>>>
>>> Is it normal that I lost 8TB just for the file system? That's almost
>>> 30% of the volume. Should I set the block size higher? Or should I
>>> increase the number of allocation groups? Would that make a
>>> difference? Whats the preferred method for handling such large
>>> volumes?
>>
>> If it was 12x3TB I imagine you're confusing TB with TiB, so
>> perhaps your 30T is really only 27TiB to start with.
>>
>> Anyway, fs metadata should not eat much space:
>>
>> # mkfs.xfs -dfile,name=fsfile,size=30t
>> # ls -lh fsfile
>> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile
>> # mount -o loop fsfile  mnt/
>> # df -h mnt
>> Filesystem            Size  Used Avail Use% Mounted on
>> /tmp/fsfile            30T  5.0M   30T   1% /tmp/mnt
>>
>> So Christoph's question was a good one; where are you getting
>> your sizes?

To solve your original problem, can you answer the above question?
Adding your actual raid config output (/proc/mdstat maybe) would help
too.

> An academic question:
> 
> Why is it that I get
> 
> merkaba:/tmp> mkfs.xfs -dfile,name=fsfile,size=30t
> meta-data=fsfile                 isize=256    agcount=30, agsize=268435455 
> blks
>          =                       sectsz=512   attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=8053063650, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =Internes Protokoll     bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =keine                  extsz=4096   blocks=0, rtextents=0
> 
> merkaba:/tmp> mount -o loop fsfile /mnt/zeit
> merkaba:/tmp> df -hT /mnt/zeit
> Dateisystem    Typ  Größe Benutzt Verf. Verw% Eingehängt auf
> /dev/loop0     xfs    30T     33M   30T    1% /mnt/zeit
> merkaba:/tmp> LANG=C df -hT /mnt/zeit
> Filesystem     Type  Size  Used Avail Use% Mounted on
> /dev/loop0     xfs    30T   33M   30T   1% /mnt/zeit
> 
> 
> 33MiB used on first mount instead of 5?

Not sure offhand, differences in xfsprogs version mkfs defaults perhaps.

...

> Hmmm, but creating the file on Ext4 does not work:

ext4 is not designed to handle very large files, so anything
above 16T will fail.

> fallocate instead of sparse file?

no, you just ran into file offset limits on ext4.

> And on BTRFS as well as XFS it appears to try to create a 30T file for 
> real, i.e. by writing data - I stopped it before it could do too much 
> harm.

Why do you say that it appears to create a 30T file for real?  It
should not...

> Where did you create that hugish XFS file?

On XFS.  Of course.  :)

> Ciao,

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs