Re: [RFC net-next 1/1] net/smc: SMC for inter-VM communication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/20/22 1:00 PM, Tony Lu wrote:
Hi all,

# Background

We (Alibaba Cloud) have already used SMC in cloud environment to
transparently accelerate TCP applications with ERDMA [1]. Nowadays,
there is a common scenario that deploy containers (which runtime is
based on lightweight virtual machine) on ECS (Elastic Compute Service),
and the containers may want to be scheduled on the same host in order to
get higher performance of network, such as AI, big data or other
scenarios that are sensitive with bandwidth and latency. Currently, the
performance of inter-VM is poor and CPU resource is wasted (see
#Benchmark virtio). This scenario has been discussed many times, but a
solution for a common scenario for applications is missing [2] [3] [4].

# Design

In inter-VM scenario, we use ivshmem (Inter-VM shared memory device)
which is modeled by QEMU [5]. With it, multiple VMs can access one
shared memory. This shared memory device is statically created by host
and shared to desired guests. The device exposes as a PCI BAR, and can
interrupt its peers (ivshmem-doorbell).

In order to use ivshmem in SMC, we write a draft device driver as a
bridge between SMC and ivshmem PCI device. To make it easier, this
driver acts like a SMC-D device in order to fit in SMC without modifying
the code, which is named ivpci (see patch #1).

   ┌───────────────────────────────────────┐
   │  ┌───────────────┐ ┌───────────────┐  │
   │  │      VM1      │ │      VM2      │  │
   │  │┌─────────────┐│ │┌─────────────┐│  │
   │  ││ Application ││ ││ Application ││  │
   │  │├─────────────┤│ │├─────────────┤│  │
   │  ││     SMC     ││ ││     SMC     ││  │
   │  │├─────────────┤│ │├─────────────┤│  │
   │  ││    ivpci    ││ ││    ivpci    ││  │
   │  └└─────────────┘┘ └└─────────────┘┘  │
   │        x  *               x  *        │
   │        x  ****************x* *        │
   │        x  xxxxxxxxxxxxxxxxx* *        │
   │        x  x                * *        │
   │  ┌───────────────┐ ┌───────────────┐  │
   │  │shared memories│ │ivshmem-server │  │
   │  └───────────────┘ └───────────────┘  │
   │                HOST A                 │
   └───────────────────────────────────────┘
    *********** Control flow (interrupt)
    xxxxxxxxxxx Data flow (memory access)

Inside ivpci driver, it implements almost all the operations of SMC-D
device. It can be divided into two parts:

- control flow, most of it is same with SMC-D, use ivshmem trigger
   interruptions in ivpci and process CDC flow.

- data flow, the shared memory of each connection is one large region
   and divided into two part for local and remote RMB. Every writer
   syscall copies data to sndbuf and calls ISM's move_data() to move data
   to remote RMB in ivshmem and interrupt remote. And reader then
   receives interruption and check CDC message, consume data if cursor is
   updated.

# Benchmark

Current POC of ivpci is unstable and only works for single SMC
connection. Here is the brief data:

Items         Latency (pingpong)    Throughput (64KB)
TCP (virtio)   19.3 us                3794.185 MBps
TCP (SR-IOV)   13.2 us                3948.792 MBps
SMC (ivshmem)   6.3 us               11900.269 MBps

Test environments:

- CPU Intel Xeon Platinum 8 core, mem 32 GiB
- NIC Mellanox CX4 with 2 VFs in two different guests
- using virsh to setup virtio-net + vhost
- using sockperf and single connection
- SMC + ivshmem throughput uses one-copy (userspace -> kernel copy)
   with intrusive modification of SMC (see patch #1), latency (pingpong)
   use two-copy (user -> kernel and move_data() copy, patch version).

With the comparison, SMC with ivshmem gets 3-4x bandwidth and a half
latency.

TCP + virtio is the most usage solution for guest, it gains lower
performance. Moreover, it consumes extra thread with full CPU core
occupied in host to transfer data, wastes more CPU resource. If the host
is very busy, the performance will be worse.


Hi Tony,

Quite interesting! FWIW for s390x we are also looking at passthrough of host ISM devices to enable SMC-D in QEMU guests:
https://lore.kernel.org/kvm/20220606203325.110625-1-mjrosato@xxxxxxxxxxxxx/
https://lore.kernel.org/kvm/20220606203614.110928-1-mjrosato@xxxxxxxxxxxxx/

But seems to me an 'emulated ISM' of sorts could still be interesting even on s390x e.g. for scenarios where host device passthrough is not possible/desired.

Out of curiosity I tried this ivpci module on s390x but the device won't probe -- This is possibly an issue with the s390x PCI emulation layer in QEMU, I'll have to look into that.

# Discussion

This RFC and solution is still in early stage, so we want to come it up
as soon as possible and fully discuss with IBM and community. We have
some topics putting on the table:

1. SMC officially supports this scenario.

SMC + ivshmem shows huge improvement when communicating inter VMs. SMC-D
and mocking ISM device might not be the official solution, maybe another
extension for SMC besides SMC-R and SMC-D. So we are wondering if SMC
would accept this idea to fix this scenario? Are there any other
possibilities?

I am curious about ivshmem and its current state though -- e.g. looking around I see mention of v2 which you also referenced but don't see any activity on it for a few years? And as far as v1 ivshmem -- server "not for production use", etc.

Thanks,
Matt


2. Implementation of SMC for inter-VM.

SMC is used in container and cloud environment, maybe we can propose a
new device and new protocol if possible in these new scenarios to solve
this problem.

3. Standardize this new protocol and device.

SMC-R has an open RFC 7609, so can this new device or protocol like
SMC-D can be standardized. There is a possible option that proposing a
new device model in QEMU + virtio ecosystem and SMC supports this
standard virtio device, like [6].

If there are any problems, please point them out.

Hope to hear from you, thank you.

[1] https://lwn.net/Articles/879373/
[2] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
[3] https://dl.acm.org/doi/10.1145/2847562
[4] https://hal.archives-ouvertes.fr/hal-00368622/document
[5] https://github.com/qemu/qemu/blob/master/docs/specs/ivshmem-spec.txt
[6] https://github.com/siemens/jailhouse/blob/master/Documentation/ivshmem-v2-specification.md

Signed-off-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux