From: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxx> Changelog in v2: These changes are based on Paolo's suggestion: 1) rename the lockless multithreads model to threaded workqueue 2) hugely improve the internal design, that make all the request be a large array, properly partition it, assign requests to threads respectively and use bitmaps to sync up threads and the submitter, after that ptr_ring and spinlock are dropped 3) introduce event wait for the submitter These changes are based on Emilio's review: 4) make more detailed description for threaded workqueue 5) add a benchmark for threaded workqueue The previous version can be found at https://marc.info/?l=kvm&m=153968821910007&w=2 There's the simple performance measurement comparing these two versions, the environment is the same as we listed in the previous version. Use 8 threads to compress the data in the source QEMU - with compress-wait-thread = off total time busy-ratio -------------------------------------------------- v1 125066 0.38 v2 120444 0.35 - with compress-wait-thread = on total time busy-ratio -------------------------------------------------- v1 164426 0 v2 142609 0 The v2 win slightly. Xiao Guangrong (5): bitops: introduce change_bit_atomic util: introduce threaded workqueue migration: use threaded workqueue for compression migration: use threaded workqueue for decompression tests: add threaded-workqueue-bench include/qemu/bitops.h | 13 + include/qemu/threaded-workqueue.h | 94 +++++++ migration/ram.c | 538 ++++++++++++++------------------------ tests/Makefile.include | 5 +- tests/threaded-workqueue-bench.c | 256 ++++++++++++++++++ util/Makefile.objs | 1 + util/threaded-workqueue.c | 466 +++++++++++++++++++++++++++++++++ 7 files changed, 1030 insertions(+), 343 deletions(-) create mode 100644 include/qemu/threaded-workqueue.h create mode 100644 tests/threaded-workqueue-bench.c create mode 100644 util/threaded-workqueue.c -- 2.14.5