On (Tue) Feb 24 2009 [11:58:31], Daniel P. Berrange wrote: > On Tue, Feb 24, 2009 at 05:09:31PM +0530, Amit Shah wrote: ... > > The best case to get a non-fragmented VM image is to have it allocated > > completely at create-time with fallocate(). > > The main problem with this change is that it'll make it harder for > us to provide incremental feedback. As per the comment in the code, > it is our intention to make the volume creation API run as a background > job which provides feedback on progress of allocation, and the ability > to cancel the job. Since posix_fallocate() is an all-or-nothing kind of > API it wouldn't be very helpful. > > What sort of performance boost does this give you ? Would we perhaps > be able to get close to it by writing in bigger chunks than 4k, or > mmap'ing the file and then doing a memset across it ? I have a program up at [1] that gives me the following data. [1] http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git;a=blob_plain;f=test-file-zero-alloc-speed.c;hb=HEAD I compiled results for ext3, ext4, xfs and btrfs. I used the following methods to allocate a file (1 GB in size) and zero it: - posix_fallocate() - mmap() and memset() - write chunks, sized 4k and 8k. Results: --- ext4: posix-fallocate run time: (approx 0s) mmap run time: (approx 13s) 4096-sized chunk run time: (approx 15s) 8192-sized chunk run time: (approx 18s) $ sudo filefrag /mnt/ext4/* /mnt/ext4/file-chunk4: 29 extents found /mnt/ext4/file-chunk8: 20 extents found /mnt/ext4/file-mmap: 38 extents found /mnt/ext4/file-pf: 1 extent found --- xfs: posix-fallocate run time: (approx 0s) mmap run time: (approx 14s) 4096-sized chunk run time: (approx 18s) 8192-sized chunk run time: (approx 19s) $ sudo filefrag /mnt/xfs/* /mnt/xfs/file-chunk4: 3 extents found /mnt/xfs/file-chunk8: 4 extents found /mnt/xfs/file-mmap: 2 extents found /mnt/xfs/file-pf: 1 extent found --- ext3: posix-fallocate run time: (approx 18s) mmap run time: (approx 20s) 4096-sized chunk run time: (approx 22s) 8192-sized chunk run time: (approx 24s) $ sudo filefrag /mnt/ext3/* /mnt/ext3/file-chunk4: 38 extents found, perfection would be 9 extents /mnt/ext3/file-chunk8: 9 extents found /mnt/ext3/file-mmap: 44 extents found, perfection would be 9 extents /mnt/ext3/file-pf: 9 extents found --- btrfs: posix-fallocate run time: (approx 0s) mmap run time: (approx 18s) 4096-sized chunk run time: (approx 17s) 8192-sized chunk run time: (approx 19s) $ sudo /mnt/btrfs/* FIBMAP: Invalid argument --- I have detailed results up at http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git;a=blob_plain;f=results.txt;hb=HEAD The link to the git tree is http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git Clearly, extents-based file systems provide a very very fast fallocate() implementation that allocates a new file and zeroes it. Since F11 is going to have ext4 by default, I strongly suggest we switch to posix_fallocate() for Linux hosts. The feedback should not matter on the newer file systems as the alloc is really fast and we anyway don't have an implementation currently for non-extent-based file systems. It really won't be missed for newer hosts. Inspite of this if some feedback is needed for a non-extents-based file system, a run-time probe for the underlying file system can be made and we could default to a chunk-based allocation in that case. For systems that do not implement posix_fallocate(), some configure-magic is needed. Amit -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list