Hello everyone, The progress in virtualization and cloud technologies has resulted in a popularity of file sets shared on the host machines by Plan 9 File Protocol (the sharing setup is also known as VirtFS). Another sharing setup which uses NFS protocol is less popular because of number of reasons. Unfortunately performance of default VirtFS setup is poor. We analyzed the reasons in our Labs of Huawei Technologies and found that typical bottleneck is caused by the fact that transfer of any portion of data by many small 9p messages is slower than transfer of the same amount of data by a single 9p message. It is possible to reduce number of 9P messages (and, hence, to improve performance) by a number of ways(*), however, some "hardcoded" bottlenecks are still present in the v9fs driver of the guest kernel. Specifically, this is poor implementation of read-ahead and write-behind paths of v9fs. With current implementations there is no chances that more than PAGE_SIZE bytes of data will be transmitted at one time. To improve the situation we have introduced a special layer, which allows to coalesce specified number of adjacent pages (if any) into a single 9P message. This layer is represented by private implementations of ->readpages() and ->writepages() address space operations for v9fs. To merge adjacent pages we use a special buffer of size, which depends on specified (per mount session) msize. For read-ahead paths we allocate such buffers by demand. For writeback paths we use a single buffer pre-allocated at mount time. This is because at writeback paths the file system usually responds to memory pressure notification, so we can not afford allocation by-demand at writeback paths. All pages which are supposed to merge are coped to the buffer at respective offsets. Then we construct and transmit a long read(write) 9P message (**). Thus, we managed to speedup only one writeback-ing thread. Other concurrent threads, which failed to obtain the buffer, will go by usual (slow) ways. If interesting, I'll implement a solution with N pre-allocated buffers (where N is number of CPUs). This approach allows to increase VirtFS bandwidth up to 3 times, and thus, to make it close to the bandwidth of VirtIO-blk (see the numbers in the Appendix A below). Comment. Our patches improve only asynchronous operations, which involve the page cache. Direct reads and writes will be unaffected by obvious reasons. NOTE, that by default v9fs works in direct mode, so in order to see an effect, you should specify respective v9fs mount option (e.g. "fscache"). ---- (*) Specifying larger mszie (maximal possible size of 9P message) allow to reduce number of 9P messages in direct operations performed by large chunks. Disabling v9fs ACL and security labels in the guest kernel (if it is not needed) allows to avoid extra-messages. (**) 9P, Plan 9 File Protocol specifications https://swtch.com/plan9port/man/man9/intro.html Appendix A. iozone -e -r chunk_size -s 6G -w -f Throughput in mBytes/sec operation chunk_size (1) (2) (3) write 1M 391 127 330 read 1M 469 221 432 write 4K 410 128 297 read 4K 465 192 327 random write 1M 403 145 313 random read 1M 347 161 195 random write 4K 344 119 131 random read 4K 44 41 64 Legend: (1): VirtIO-blk (2): VirtIO-9p (3): VirtIO-9p: guest kernel is patched with our stuff Hardware & Software: Host: 8 CPU Intel(R) Core(TM) Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 16G RAM, SSD: Noname. Throughput: Write: 410 M/sec, Read: 470 M/sec. Fedora 24, Kernel-4.7.4-200.fc24.x86_64, kvm+qemu-2.7.0, fs: ext4 Guest: 2 CPU: GenuineIntel 3.4GHz, 2G RAM, Network model: VirtIO Fedora 21, Kernel: 4.7.6 Settings: VirtIO-blk: Guest FS: ext4; VirtIO-9p: mount option: "trans=virtio,version=9p2000.L,msize=131096,fscache" Caches of the host and guest were dropped before every iozone's phase. CC-ed QEMU developers list for possible comments and ACKs. Please, consider for inclusion. Thanks, Edward. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html