Am Samstag, 28. Januar 2012 schrieb Eric Sandeen: > On 1/28/12 8:55 AM, Martin Steigerwald wrote: > > Am Freitag, 27. Januar 2012 schrieb Eric Sandeen: > >> On 1/27/12 1:50 AM, Manny wrote: > >>> Hi there, > >>> > >>> I'm not sure if this is intended behavior, but I was a bit stumped > >>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in > >>> RAID 6) with XFS and noticed that there were only 22 TB left. I > >>> just called mkfs.xfs with default parameters - except for swith > >>> and sunit which match the RAID setup. > >>> > >>> Is it normal that I lost 8TB just for the file system? That's > >>> almost 30% of the volume. Should I set the block size higher? Or > >>> should I increase the number of allocation groups? Would that make > >>> a difference? Whats the preferred method for handling such large > >>> volumes? > >> > >> If it was 12x3TB I imagine you're confusing TB with TiB, so > >> perhaps your 30T is really only 27TiB to start with. > >> > >> Anyway, fs metadata should not eat much space: > >> > >> # mkfs.xfs -dfile,name=fsfile,size=30t > >> # ls -lh fsfile > >> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile > >> # mount -o loop fsfile mnt/ > >> # df -h mnt > >> Filesystem Size Used Avail Use% Mounted on > >> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt > >> > >> So Christoph's question was a good one; where are you getting > >> your sizes? > > To solve your original problem, can you answer the above question? > Adding your actual raid config output (/proc/mdstat maybe) would help > too. Eric, I wrote > > An academic question: to make clear that it was just something I was curious about. I was not the reporter of the problem anyway, I have no problem, the reporter has no problem, see his answer, so all is good ;) With your hint and some thinking / testing through it I was able to resolve most of my other questions. Thanks. For the gory details: > > Why is it that I get […] > > merkaba:/tmp> LANG=C df -hT /mnt/zeit > > Filesystem Type Size Used Avail Use% Mounted on > > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit > > > > > > 33MiB used on first mount instead of 5? > > Not sure offhand, differences in xfsprogs version mkfs defaults > perhaps. Okay, thats fine with me. I was just curious. It doesn´t matter much. > > Hmmm, but creating the file on Ext4 does not work: > ext4 is not designed to handle very large files, so anything > above 16T will fail. > > > fallocate instead of sparse file? > > no, you just ran into file offset limits on ext4. Oh, yes. Completely forgot about these Ext4 limits. Sorry. > > And on BTRFS as well as XFS it appears to try to create a 30T file > > for real, i.e. by writing data - I stopped it before it could do too > > much harm. > > Why do you say that it appears to create a 30T file for real? It > should not... I jumped to a conclusion too quickly. It did do a I/O storm onto the Intel SSD 320: martin@merkaba:~> vmstat -S M 1 (not applied to bi/bo) procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1630 4365 87 1087 0 0 101 53 7 81 5 2 93 0 1 0 1630 4365 87 1087 0 0 0 0 428 769 1 0 99 0 2 0 1630 4365 87 1087 0 0 0 0 426 740 1 1 99 0 0 0 1630 4358 87 1088 0 0 0 0 1165 2297 4 7 89 0 0 0 1630 4357 87 1088 0 0 0 40 1736 3434 8 6 86 0 0 0 1630 4357 87 1088 0 0 0 0 614 1121 3 1 96 0 0 0 1630 4357 87 1088 0 0 0 32 359 636 0 0 100 0 1 1 1630 3852 87 1585 0 0 13 81540 529 1045 1 7 91 1 0 3 1630 3398 87 2027 0 0 0 227940 1357 2764 0 9 54 37 4 3 1630 3225 87 2188 0 0 0 212004 2346 4796 5 6 41 49 1 3 1630 2992 87 2415 0 0 0 215608 1825 3821 1 6 42 50 0 2 1630 2820 87 2582 0 0 0 200492 1476 3089 3 6 49 41 1 1 1630 2569 87 2832 0 0 0 198156 1250 2508 0 6 59 34 0 2 1630 2386 87 3009 0 0 0 229896 1301 2611 1 6 56 37 0 2 1630 2266 87 3126 0 0 0 302876 1067 2093 0 5 62 33 1 3 1630 2266 87 3126 0 0 0 176092 723 1321 0 3 71 26 0 3 1630 2266 87 3126 0 0 0 163840 706 1351 0 1 74 25 0 1 1630 2266 87 3126 0 0 0 80104 3137 6228 1 4 69 26 0 0 1630 2267 87 3126 0 0 0 3 3505 7035 6 3 86 5 0 0 1630 2266 87 3126 0 0 0 0 631 1203 4 1 95 0 0 0 1630 2259 87 3127 0 0 0 0 715 1398 4 2 94 0 2 0 1630 2259 87 3127 0 0 0 0 1501 3087 10 3 86 0 0 0 1630 2259 87 3127 0 0 0 27 945 1883 5 2 93 0 0 0 1630 2259 87 3127 0 0 0 0 399 713 1 0 99 0 ^C But then stopped. Thus mkfs.xfs was just writing metadata it seems and I didn´t see this in the tmpfs obviously. But when I review it, creating a 30TB XFS filesystem should involve writing some metadata at different places of the file. I get: merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile fsfile: 0: [0..255]: 96..351 1: [256..2147483639]: hole 2: [2147483640..2147483671]: 3400032..3400063 3: [2147483672..4294967279]: hole 4: [4294967280..4294967311]: 3400064..3400095 5: [4294967312..6442450919]: hole 6: [6442450920..6442450951]: 3400096..3400127 7: [6442450952..8589934559]: hole 8: [8589934560..8589934591]: 3400128..3400159 9: [8589934592..10737418199]: hole 10: [10737418200..10737418231]: 3400160..3400191 11: [10737418232..12884901839]: hole 12: [12884901840..12884901871]: 3400192..3400223 13: [12884901872..15032385479]: hole 14: [15032385480..15032385511]: 3400224..3400255 15: [15032385512..17179869119]: hole 16: [17179869120..17179869151]: 3400256..3400287 17: [17179869152..19327352759]: hole 18: [19327352760..19327352791]: 3400296..3400327 19: [19327352792..21474836399]: hole 20: [21474836400..21474836431]: 3400328..3400359 21: [21474836432..23622320039]: hole 22: [23622320040..23622320071]: 3400360..3400391 23: [23622320072..25769803679]: hole 24: [25769803680..25769803711]: 3400392..3400423 25: [25769803712..27917287319]: hole 26: [27917287320..27917287351]: 3400424..3400455 27: [27917287352..30064770959]: hole 28: [30064770960..30064770991]: 3400456..3400487 29: [30064770992..32212254599]: hole 30: [32212254600..32212254631]: 3400488..3400519 31: [32212254632..32215654311]: 352..3400031 32: [32215654312..32216428455]: 3400520..4174663 33: [32216428456..34359738239]: hole 34: [34359738240..34359738271]: 4174664..4174695 35: [34359738272..36507221879]: hole 36: [36507221880..36507221911]: 4174696..4174727 37: [36507221912..38654705519]: hole 38: [38654705520..38654705551]: 4174728..4174759 39: [38654705552..40802189159]: hole 40: [40802189160..40802189191]: 4174760..4174791 41: [40802189192..42949672799]: hole 42: [42949672800..42949672831]: 4174792..4174823 43: [42949672832..45097156439]: hole 44: [45097156440..45097156471]: 4174824..4174855 45: [45097156472..47244640079]: hole 46: [47244640080..47244640111]: 4174856..4174887 47: [47244640112..49392123719]: hole 48: [49392123720..49392123751]: 4174888..4174919 49: [49392123752..51539607359]: hole 50: [51539607360..51539607391]: 4174920..4174951 51: [51539607392..53687090999]: hole 52: [53687091000..53687091031]: 4174952..4174983 53: [53687091032..55834574639]: hole 54: [55834574640..55834574671]: 4174984..4175015 55: [55834574672..57982058279]: hole 56: [57982058280..57982058311]: 4175016..4175047 57: [57982058312..60129541919]: hole 58: [60129541920..60129541951]: 4175048..4175079 59: [60129541952..62277025559]: hole 60: [62277025560..62277025591]: 4175080..4175111 61: [62277025592..64424509191]: hole 62: [64424509192..64424509199]: 4175112..4175119 Okay, it needed to write 2 GB: merkaba:/mnt/zeit> du -h fsfile 2,0G fsfile merkaba:/mnt/zeit> du --apparent-size -h fsfile 30T fsfile merkaba:/mnt/zeit> I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it for a 30 TB filesystem I find this reasonable. Still it has 33 MiB for metadata: merkaba:/mnt/zeit> mkdir bigfilefs merkaba:/mnt/zeit> mount -o loop fsfile bigfilefs merkaba:/mnt/zeit> LANG=C df -hT bigfilefs Filesystem Type Size Used Avail Use% Mounted on /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit/bigfilefs Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs