On Fri, Sep 01, 2023 at 01:10:16PM +0200, Kevin Wolf wrote: > And nbdkit seems to get worse instead of better with larger cluster > size, no matter whether zlib or zstd is used. It's caused by nbdcopy's default request size being 256k. Increasing it to 2M cures the scaling problem - see updated results below. (Note nbdkit & nbdcopy are being used together so we're copying between two programs. The request size is the size of NBD requests between the two.) > If you think using more threads is the key for the remaining difference > at 64k, would increasing QCOW2_MAX_THREADS (currently only 4) help on > the qemu-img side? Results: qemu800 = qemu-img-8.0.0-4.fc39.x86_64 [previous results] qemugit = qemu @ 17780edd81d qemuthr = qemu @ 17780edd81d with QCOW2_MAX_THREADS changed from 4 to 16 nbdkit = nbdkit-1.35.11-2.fc40.x86_64 [previous results] nbdkit2M = nbdkit with nbdcopy --request-size=$((2*1024*1024)) Cluster Compression Compressed size Prog Decompression speed 4k zlib 3228811264 qemu800 5.921 s ± 0.074 s 4k zstd 3258097664 qemu800 5.189 s ± 0.158 s 4k zlib 3228811264 qemugit 7.021 s ± 0.234 s 4k zstd 3258097664 qemugit 6.594 s ± 0.170 s 4k zlib 3228811264 qemuthr 6.744 s ± 0.111 s 4k zstd 3258097664 qemuthr 6.428 s ± 0.206 s 4k zlib 3228811264 nbdkit 1.390 s ± 0.094 s 4k zstd 3258097664 nbdkit 1.328 s ± 0.055 s 64k zlib 3164667904 qemu800 3.579 s ± 0.094 s 64k zstd 3132686336 qemu800 1.770 s ± 0.060 s 64k zlib 3164667904 qemugit 3.644 s ± 0.018 s 64k zstd 3132686336 qemugit 1.814 s ± 0.098 s 64k zlib 3164667904 qemuthr 1.356 s ± 0.058 s 64k zstd 3132686336 qemuthr 1.266 s ± 0.064 s 64k zlib 3164667904 nbdkit 1.254 s ± 0.065 s 64k zstd 3132686336 nbdkit 1.315 s ± 0.037 s 512k zlib 3158744576 qemu800 4.008 s ± 0.058 s 512k zstd 3032697344 qemu800 1.503 s ± 0.072 s 512k zlib 3158744576 qemugit 4.015 s ± 0.040 s 512k zstd 3032697344 qemugit 1.557 s ± 0.025 s 512k zlib 3158744576 qemuthr 1.233 s ± 0.050 s 512k zstd 3032697344 qemuthr 1.149 s ± 0.032 s 512k zlib 3158744576 nbdkit 1.702 s ± 0.026 s 512k zstd 3032697344 nbdkit 1.593 s ± 0.039 s 2048k zlib 3197569024 qemu800 4.327 s ± 0.051 s 2048k zstd 2995143168 qemu800 1.465 s ± 0.085 s 2048k zlib 3197569024 qemugit 4.323 s ± 0.031 s 2048k zstd 2995143168 qemugit 1.484 s ± 0.067 s 2048k zlib 3197569024 qemuthr 1.299 s ± 0.055 s 2048k zstd 2995143168 qemuthr 1.229 s ± 0.046 s 2048k zlib 3197569024 nbdkit2M 1.636 s ± 0.071 s 2048k zstd 2995143168 nbdkit2M 1.644 s ± 0.040 s Increasing the number of threads makes a big difference, so I think changing the default (or making it run-time adjustable somehow) is a good idea, also an easy win. Increased qcow2 threads + zlib-ng would be _very_ interesting. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue