On Fri 21 Aug 2020 02:59:44 PM CEST, Brian Foster wrote: >> > Option 4 is described above as initial file preallocation whereas >> > option 1 is per 64k cluster prealloc. Prealloc mode mixup aside, Berto >> > is reporting that the initial file preallocation mode is slower than >> > the per cluster prealloc mode. Berto, am I following that right? >> >> Option (1) means that no qcow2 cluster is allocated at the beginning of >> the test so, apart from updating the relevant qcow2 metadata, each write >> request clears the cluster first (with fallocate(ZERO_RANGE)) then >> writes the requested 4KB of data. Further writes to the same cluster >> don't need changes on the qcow2 metadata so they go directly to the area >> that was cleared with fallocate(). >> >> Option (4) means that all clusters are allocated when the image is >> created and they are initialized with fallocate() (actually with >> posix_fallocate() now that I read the code, I suppose it's the same for >> xfs?). Only after that the test starts. All write requests are simply >> forwarded to the disk, there is no need to touch any qcow2 metadata nor >> do anything else. >> > > Ok, I think that's consistent with what I described above (sorry, I find > the preallocation mode names rather confusing so I was trying to avoid > using them). Have you confirmed that posix_fallocate() in this case > translates directly to fallocate()? I suppose that's most likely the > case, otherwise you'd see numbers more like with preallocation=full > (file preallocated via writing zeroes). Yes, it seems to be: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/posix_fallocate.c;h=7238b000383af2f3878a9daf8528819645b6aa31;hb=HEAD And that's also what the posix_fallocate() manual page says. >> And yes, (4) is a bit slower than (1) in my tests. On ext4 I get 10% >> more IOPS. >> >> I just ran the tests with aio=native and with a raw image instead of >> qcow2, here are the results: >> >> qcow2: >> |----------------------+-------------+------------| >> | preallocation | aio=threads | aio=native | >> |----------------------+-------------+------------| >> | off | 8139 | 7649 | >> | off (w/o ZERO_RANGE) | 2965 | 2779 | >> | metadata | 7768 | 8265 | >> | falloc | 7742 | 7956 | >> | full | 41389 | 56668 | >> |----------------------+-------------+------------| >> > > So this seems like Dave's suggestion to use native aio produced more > predictable results with full file prealloc being a bit faster than per > cluster prealloc. Not sure why that isn't the case with aio=threads. I > was wondering if perhaps the threading affects something indirectly like > the qcow2 metadata allocation itself, but I guess that would be > inconsistent with ext4 showing a notable jump from (1) to (4) (assuming > the previous ext4 numbers were with aio=threads). Yes, I took the ext4 numbers with aio=threads >> raw: >> |---------------+-------------+------------| >> | preallocation | aio=threads | aio=native | >> |---------------+-------------+------------| >> | off | 7647 | 7928 | >> | falloc | 7662 | 7856 | >> | full | 45224 | 58627 | >> |---------------+-------------+------------| >> >> A qcow2 file with preallocation=metadata is more or less similar to a >> sparse raw file (and the numbers are indeed similar). >> >> preallocation=off on qcow2 does not have an equivalent on raw files. > > It sounds like preallocation=off for qcow2 would be roughly equivalent > to a raw file with a 64k extent size hint (on XFS). There's the overhead of handling the qcow2 metadata but QEMU keeps a memory cache so it should not be too big. Berto