Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

Alberto Garcia <berto@xxxxxxxxxx> · Fri, 21 Aug 2020 17:51:38 +0200

On Fri 21 Aug 2020 02:59:44 PM CEST, Brian Foster wrote:
>> > Option 4 is described above as initial file preallocation whereas
>> > option 1 is per 64k cluster prealloc. Prealloc mode mixup aside, Berto
>> > is reporting that the initial file preallocation mode is slower than
>> > the per cluster prealloc mode. Berto, am I following that right?
>> 
>> Option (1) means that no qcow2 cluster is allocated at the beginning of
>> the test so, apart from updating the relevant qcow2 metadata, each write
>> request clears the cluster first (with fallocate(ZERO_RANGE)) then
>> writes the requested 4KB of data. Further writes to the same cluster
>> don't need changes on the qcow2 metadata so they go directly to the area
>> that was cleared with fallocate().
>> 
>> Option (4) means that all clusters are allocated when the image is
>> created and they are initialized with fallocate() (actually with
>> posix_fallocate() now that I read the code, I suppose it's the same for
>> xfs?). Only after that the test starts. All write requests are simply
>> forwarded to the disk, there is no need to touch any qcow2 metadata nor
>> do anything else.
>> 
>
> Ok, I think that's consistent with what I described above (sorry, I find
> the preallocation mode names rather confusing so I was trying to avoid
> using them). Have you confirmed that posix_fallocate() in this case
> translates directly to fallocate()? I suppose that's most likely the
> case, otherwise you'd see numbers more like with preallocation=full
> (file preallocated via writing zeroes).

Yes, it seems to be:

   https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/posix_fallocate.c;h=7238b000383af2f3878a9daf8528819645b6aa31;hb=HEAD

And that's also what the posix_fallocate() manual page says.

>> And yes, (4) is a bit slower than (1) in my tests. On ext4 I get 10%
>> more IOPS.
>> 
>> I just ran the tests with aio=native and with a raw image instead of
>> qcow2, here are the results:
>> 
>> qcow2:
>> |----------------------+-------------+------------|
>> | preallocation        | aio=threads | aio=native |
>> |----------------------+-------------+------------|
>> | off                  |        8139 |       7649 |
>> | off (w/o ZERO_RANGE) |        2965 |       2779 |
>> | metadata             |        7768 |       8265 |
>> | falloc               |        7742 |       7956 |
>> | full                 |       41389 |      56668 |
>> |----------------------+-------------+------------|
>> 
>
> So this seems like Dave's suggestion to use native aio produced more
> predictable results with full file prealloc being a bit faster than per
> cluster prealloc. Not sure why that isn't the case with aio=threads. I
> was wondering if perhaps the threading affects something indirectly like
> the qcow2 metadata allocation itself, but I guess that would be
> inconsistent with ext4 showing a notable jump from (1) to (4) (assuming
> the previous ext4 numbers were with aio=threads).

Yes, I took the ext4 numbers with aio=threads

>> raw:
>> |---------------+-------------+------------|
>> | preallocation | aio=threads | aio=native |
>> |---------------+-------------+------------|
>> | off           |        7647 |       7928 |
>> | falloc        |        7662 |       7856 |
>> | full          |       45224 |      58627 |
>> |---------------+-------------+------------|
>> 
>> A qcow2 file with preallocation=metadata is more or less similar to a
>> sparse raw file (and the numbers are indeed similar).
>> 
>> preallocation=off on qcow2 does not have an equivalent on raw files.
>
> It sounds like preallocation=off for qcow2 would be roughly equivalent
> to a raw file with a 64k extent size hint (on XFS).

There's the overhead of handling the qcow2 metadata but QEMU keeps a
memory cache so it should not be too big.

Berto