Re: fallocate bug?

Zhu Han <schumi.han@xxxxxxxxx> · Tue, 8 May 2012 13:10:55 +0800

On Tue, May 8, 2012 at 12:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

On Tue, May 08, 2012 at 11:24:52AM +0800, Zhu Han wrote:

> On Tue, May 8, 2012 at 7:59 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

>

> > On Mon, May 07, 2012 at 08:44:17PM +0800, Zhu Han wrote:

> > > Seems like xfs of CentOS 6.X occupies much more storage space than

> > desired

> > > if fallocate is used against the file. Here is the step to reproduce it:

> >

> > You test case is not doing what you think it is doing.

>

> Thanks for pointing it out.

>

> > > By the way, is it normal when the file is moved around after the

> > > preallocated region is filled with data?

> > >

> > > $ uname -r

> > > 2.6.32-220.7.1.el6.x86_64

> > >

> > > $fallocate -n --offset 0 -l 1G file    ---->Write a little more data than

> > > the preallocated size

> > >

> > > $ xfs_bmap -p -vv file

> > > file:

> > >  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET

> > > TOTAL FLAGS

> > >    0: [0..2097151]:    2593408088..2595505239 21 (29420144..31517295)

> > > 2097152 10000

> > >

> > > $ dd if=/dev/zero of=/tmp/file bs=1M count=1026 conv=fsync

> >

> > That does a truncate first, removing all the preallocated space. Use

> > conv=notrunc to avoid this. Hence the space allocated by this

> > new write is different to the space allocated by the above

> > preallocation. The file has not been moved, the filesystem just did

> > what you asked it to do.

> >

> > >

> > > $ xfs_bmap -p -vv file

> > > file:

> > >  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET

> > > TOTAL FLAGS

> > >    0: [0..4194303]:    2709184016..2713378319 22 (23101408..27295711)

> > > 4194304 00000

> >

> > And so now you've triggered the speculative delayed allocation

> > beyond EOF, which is normal behaviour. Hence there are currently

> > unused blocks beyond EOF which will get removed either when the next

> > close(fd) occurs on the file or the inode is removed from the cache.

> >

>

> Close(fd) should be invoked before dd quits. But why the extra blocks

> beyond EOF are not freed?

The removal is conditional on how many times the fd has been closed

with dirty data on the inode.

> The only way I found to remove the extra blocks is truncate the file to its

> real size.

If the close() didn't remove them, they will be removed when the

inode ages out of the cache. Why do you even care about them?

Our distributed system depends on the real length of files to account the space usage. This behavior make the account inaccurate. 

Cheers,

Dave.

--

Dave Chinner

david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs