On Wed, 3 Aug 2022 16:27:54 -0400 Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote: > On 7/20/22 1:00 PM, Tony Lu wrote: > > Hi all, > > > > # Background > > > > We (Alibaba Cloud) have already used SMC in cloud environment to > > transparently accelerate TCP applications with ERDMA [1]. Nowadays, > > there is a common scenario that deploy containers (which runtime is > > based on lightweight virtual machine) on ECS (Elastic Compute Service), > > and the containers may want to be scheduled on the same host in order to > > get higher performance of network, such as AI, big data or other > > scenarios that are sensitive with bandwidth and latency. Currently, the > > performance of inter-VM is poor and CPU resource is wasted (see > > #Benchmark virtio). This scenario has been discussed many times, but a > > solution for a common scenario for applications is missing [2] [3] [4]. > > > > # Design > > > > In inter-VM scenario, we use ivshmem (Inter-VM shared memory device) > > which is modeled by QEMU [5]. With it, multiple VMs can access one > > shared memory. This shared memory device is statically created by host > > and shared to desired guests. The device exposes as a PCI BAR, and can > > interrupt its peers (ivshmem-doorbell). > > > > In order to use ivshmem in SMC, we write a draft device driver as a > > bridge between SMC and ivshmem PCI device. To make it easier, this > > driver acts like a SMC-D device in order to fit in SMC without modifying > > the code, which is named ivpci (see patch #1). > > > > ┌───────────────────────────────────────┐ > > │ ┌───────────────┐ ┌───────────────┐ │ > > │ │ VM1 │ │ VM2 │ │ > > │ │┌─────────────┐│ │┌─────────────┐│ │ > > │ ││ Application ││ ││ Application ││ │ > > │ │├─────────────┤│ │├─────────────┤│ │ > > │ ││ SMC ││ ││ SMC ││ │ > > │ │├─────────────┤│ │├─────────────┤│ │ > > │ ││ ivpci ││ ││ ivpci ││ │ > > │ └└─────────────┘┘ └└─────────────┘┘ │ > > │ x * x * │ > > │ x ****************x* * │ > > │ x xxxxxxxxxxxxxxxxx* * │ > > │ x x * * │ > > │ ┌───────────────┐ ┌───────────────┐ │ > > │ │shared memories│ │ivshmem-server │ │ > > │ └───────────────┘ └───────────────┘ │ > > │ HOST A │ > > └───────────────────────────────────────┘ > > *********** Control flow (interrupt) > > xxxxxxxxxxx Data flow (memory access) > > > > Inside ivpci driver, it implements almost all the operations of SMC-D > > device. It can be divided into two parts: > > > > - control flow, most of it is same with SMC-D, use ivshmem trigger > > interruptions in ivpci and process CDC flow. > > > > - data flow, the shared memory of each connection is one large region > > and divided into two part for local and remote RMB. Every writer > > syscall copies data to sndbuf and calls ISM's move_data() to move data > > to remote RMB in ivshmem and interrupt remote. And reader then > > receives interruption and check CDC message, consume data if cursor is > > updated. > > > > # Benchmark > > > > Current POC of ivpci is unstable and only works for single SMC > > connection. Here is the brief data: > > > > Items Latency (pingpong) Throughput (64KB) > > TCP (virtio) 19.3 us 3794.185 MBps > > TCP (SR-IOV) 13.2 us 3948.792 MBps > > SMC (ivshmem) 6.3 us 11900.269 MBps > > > > Test environments: > > > > - CPU Intel Xeon Platinum 8 core, mem 32 GiB > > - NIC Mellanox CX4 with 2 VFs in two different guests > > - using virsh to setup virtio-net + vhost > > - using sockperf and single connection > > - SMC + ivshmem throughput uses one-copy (userspace -> kernel copy) > > with intrusive modification of SMC (see patch #1), latency (pingpong) > > use two-copy (user -> kernel and move_data() copy, patch version). > > > > With the comparison, SMC with ivshmem gets 3-4x bandwidth and a half > > latency. > > > > TCP + virtio is the most usage solution for guest, it gains lower > > performance. Moreover, it consumes extra thread with full CPU core > > occupied in host to transfer data, wastes more CPU resource. If the host > > is very busy, the performance will be worse. > > > > Hi Tony, > > Quite interesting! FWIW for s390x we are also looking at passthrough of > host ISM devices to enable SMC-D in QEMU guests: > https://lore.kernel.org/kvm/20220606203325.110625-1-mjrosato@xxxxxxxxxxxxx/ > https://lore.kernel.org/kvm/20220606203614.110928-1-mjrosato@xxxxxxxxxxxxx/ > > But seems to me an 'emulated ISM' of sorts could still be interesting > even on s390x e.g. for scenarios where host device passthrough is not > possible/desired. > > Out of curiosity I tried this ivpci module on s390x but the device won't > probe -- This is possibly an issue with the s390x PCI emulation layer in > QEMU, I'll have to look into that. > > > # Discussion > > > > This RFC and solution is still in early stage, so we want to come it up > > as soon as possible and fully discuss with IBM and community. We have > > some topics putting on the table: > > > > 1. SMC officially supports this scenario. > > > > SMC + ivshmem shows huge improvement when communicating inter VMs. SMC-D > > and mocking ISM device might not be the official solution, maybe another > > extension for SMC besides SMC-R and SMC-D. So we are wondering if SMC > > would accept this idea to fix this scenario? Are there any other > > possibilities? > > I am curious about ivshmem and its current state though -- e.g. looking > around I see mention of v2 which you also referenced but don't see any > activity on it for a few years? And as far as v1 ivshmem -- server "not > for production use", etc. > > Thanks, > Matt > > > > > 2. Implementation of SMC for inter-VM. > > > > SMC is used in container and cloud environment, maybe we can propose a > > new device and new protocol if possible in these new scenarios to solve > > this problem. > > > > 3. Standardize this new protocol and device. > > > > SMC-R has an open RFC 7609, so can this new device or protocol like > > SMC-D can be standardized. There is a possible option that proposing a > > new device model in QEMU + virtio ecosystem and SMC supports this > > standard virtio device, like [6]. > > > > If there are any problems, please point them out. > > > > Hope to hear from you, thank you. > > > > [1] https://lwn.net/Articles/879373/ > > [2] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html > > [3] https://dl.acm.org/doi/10.1145/2847562 > > [4] https://hal.archives-ouvertes.fr/hal-00368622/document > > [5] https://github.com/qemu/qemu/blob/master/docs/specs/ivshmem-spec.txt > > [6] https://github.com/siemens/jailhouse/blob/master/Documentation/ivshmem-v2-specification.md > > > > Signed-off-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx> Also looks a lot like existing VSOCK which has transports for Virtio, HyperV and VMWare already.