Re: Two questions regarding ext4_fallocate()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 4 May 2013 13:33:26 -0400, Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
> > Hi,
> >    I have two questions regarding ext4_fallocate(),
> > 
> >    (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
> > what is the usage for it? The only use case comes to my mind is
> > while ext4 being used for virtual machine image file storage. When
> > VMM is aware of the file deleting operation in guest os, it can
> > invoke host file system's fallocate() on the virtual machine image
> > file to punch a hole to free host storage, so that save host
> > space. But how can VMM being aware of guest file deleting? Simulate
> > a virtual SSD-like block device to guest os, then capture the TRIM
> > instruction issued by guest file system? That seems too tricky.  So
> > basically, where and how to benefit from hole punching?
> 
> It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
> or VMWare, are already simulating a SATA device to the guest OS.
> Implementing support for the TRIM request is not that hard, and most
> of the hypervisors are doing this already.  Implementing the punch
> hole functionality was indeed primarily motivated for this use case.
> 
> The other historical use of this was for digital video recorders, but
> that's a much more specialized use case.
> 
> >    (2) At the beginning of the function ext4_ext_punch_hole(), the
> > codes are as follows,
> > 
> >         /* write out all dirty pages to avoid race condition */
> >         filemap_write_and_wait_range(mapping, offset, offset+length-1);
> >         mutex_lock(&inode->i_mutex);
> >         truncate_page_cache_range();
> > 
> >     Why does it need synchronously write back the dirty pages fit
> > into the hole, the data on the disk responding to those pages are to
> > be deleted, why not directly release those pages, no matter they are
> > dirty or not.  And furthermore, this is done before the inode lock is
> > held, so it seems it may happen that after the pages are written
> > back, and before the lock is held, those pages are dirtied again.
> > So basically, why does it need call filemap_write_and_wait_range()
> > before releasing those pages?
> 
> That's a good a question.  Looking at it, I'm not sure we do.  I
> suspect this was put in originally to avoid races with setting the
> EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
> writes to sneak in before we grab the i_mutex.  As a result, we ended
> up dropping the need for EOFBLOCKS_FL entirely.
> 
> Maybe one of the ext4 developers will see something that I'm missing,
> but I think we can drop this, which indeed will have a significant
> performance improvement for systems that use the punch hole
> functionality.
Yes, there is a space for optimization here, but ordered case is special
and we have to call analog of ext4_begin_ordered_truncate() with two
arguments.
> 
> Cheers,
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux