Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

Alberto Garcia <berto@xxxxxxxxxx> · Mon, 24 Aug 2020 22:14:07 +0200

On Sun 23 Aug 2020 11:59:07 PM CEST, Dave Chinner wrote:
>> >> Option 4 is described above as initial file preallocation whereas
>> >> option 1 is per 64k cluster prealloc. Prealloc mode mixup aside, Berto
>> >> is reporting that the initial file preallocation mode is slower than
>> >> the per cluster prealloc mode. Berto, am I following that right?
>> 
>> After looking more closely at the data I can see that there is a peak of
>> ~30K IOPS during the first 5 or 6 seconds and then it suddenly drops to
>> ~7K for the rest of the test.
>
> How big is the filesystem, how big is the log? (xfs_info output,
> please!)

The size of the filesystem is 126GB and here's the output of xfs_info:

meta-data=/dev/vg/test           isize=512    agcount=4, agsize=8248576 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=32994304, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16110, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

>> I was running fio with --ramp_time=5 which ignores the first 5 seconds
>> of data in order to let performance settle, but if I remove that I can
>> see the effect more clearly. I can observe it with raw files (in 'off'
>> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
>> preallocation=off the performance is stable during the whole test.
>
> What does "preallocation=off" mean again? Is that using
> fallocate(ZERO_RANGE) prior to the data write rather than
> preallocating the metadata/entire file?

Exactly, it means that. One fallocate() call before each data write
(unless the area has been allocated by a previous write).

> If so, I would expect the limiting factor is the rate at which IO can
> be issued because of the fallocate() triggered pipeline bubbles. That
> leaves idle device time so you're not pushing the limits of the
> hardware and hence none of the behaviours above will be evident...

The thing is that with raw (i.e. non-qcow2) images the number of IOPS is
similar, but in that case there are no fallocate() calls, only the data
writes.

Berto