Re: [RFC net-next 0/7] Provide an ism layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-01-17 21:29:09, Andrew Lunn wrote:
>On Fri, Jan 17, 2025 at 05:57:10PM +0100, Niklas Schnelle wrote:
>> On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
>> > > Conceptually kind of but the existing s390 specific ISM device is a bit
>> > > special. But let me start with some background. On s390 aka Mainframes
>> > > OSs including Linux runs in so called logical partitions (LPARs) which
>> > > are machine hypervisor VMs which use partitioned non-paging memory. The
>> > > fact that memory is partitioned is important because this means LPARs
>> > > can not share physical memory by mapping it.
>> > > 
>> > > Now at a high level an ISM device allows communication between two such
>> > > Linux LPARs on the same machine. The device is discovered as a PCI
>> > > device and allows Linux to take a buffer called a DMB map that in the
>> > > IOMMU and generate a token specific to another LPAR which also sees an
>> > > ISM device sharing the same virtual channel identifier (VCHID). This
>> > > token can then be transferred out of band (e.g. as part of an extended
>> > > TCP handshake in SMC-D) to that other system. With the token the other
>> > > system can use its ISM device to securely (authenticated by the token,
>> > > LPAR identity and the IOMMU mapping) write into the original systems
>> > > DMB at throughput and latency similar to doing a memcpy() via a
>> > > syscall.
>> > > 
>> > > On the implementation level the ISM device is actually a piece of
>> > > firmware and the write to a remote DMB is a special case of our PCI
>> > > Store Block instruction (no real MMIO on s390, instead there are
>> > > special instructions). Sadly there are a few more quirks but in
>> > > principle you can think of it as redirecting writes to a part of the
>> > > ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
>> > > There's of course also a mechanism to cause an interrupt on the
>> > > receiver as the write completes.
>> > 
>> > So the s390 details are interesting, but as you say, it is
>> > special. Ideally, all the special should be hidden away inside the
>> > driver.
>> 
>> Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
>> through but that's not unusual and why there is already the concept of
>> vfio-pci extension module.
>> 
>> > 
>> > So please take a step back. What is the abstract model?
>> 
>> I think my high level description may be a good start. The abstract
>> model is the ability to share a memory buffer (DMB) for writing by a
>> communication partner, authenticated by a DMB Token. Plus stuff like
>> triggering an interrupt on write or explicit trigger. Then Alibaba
>> added optional support for what they called attaching the buffer which
>> means it becomes truly shared between the peers but which IBM's ISM
>> can't support. Plus a few more optional pieces such as VLANs, PNETIDs
>> don't ask. The idea for the new layer then is to define this interface
>> with operations and documentation.
>> 
>> > 
>> > Can the abstract model be mapped onto CLX? Could it be used with a GPU
>> > vRAM? SoC with real shared memory between a pool of CPUs.
>> > 
>> > 	Andrew
>> 
>> I'd think that yes, one could implement such a mechanism on top of CXL
>> as well as on SoC. Or even with no special hardware between a host and
>> a DPU (e.g. via PCIe endpoint framework). Basically anything that can
>> DMA and IRQs between two OS instances.
>
>Is DMA part of the abstract model? That would suggest a true shared
>memory system is excluded, since that would not require DMA.
>
>Maybe take a look at subsystems like USB, I2C.
>
>usb_submit_urb(struct urb *urb, gfp_t mem_flags)
>
>An URB is a data structure with a block of memory associated with it,
>contains the detail to pass to the USB device.
>
>i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
>
>*msgs points to num of messages which get transferred to/from the I2C
>device.
>
>Could the high level API look like this? No DMA, no IRQ, no concept of
>a somewhat shared memory. Just an API which asks for a message to be
>sent to the other end? struct urb has some USB concepts in it, struct
>i2c_msg has some I2C concepts in it. A struct ism_msg would follow the
>same pattern, but does it need to care about the DMA, the IRQ, the
>memory which is semi shared?

I don’t have a clear picture of what the API should look like yet, but I
believe it’s possible to avoid DMA and IRQ. In fact, the current data
transfer API, ops->move_data() in include/linux/ism.h, already abstracts
away the DMA and IRQ details.

One thing we cannot hide, however, is whether the operation is zero-copy
or copy. This distinction is important because we can reuse the data at
different times in copy mode and zero-copy mode.

Best regards,
Dust





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux