Re: [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/09/2024 15.05, Jonathan Cameron wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 24 Sep 2024 08:30:17 +0000
> Zhi Wang <zhiw@xxxxxxxxxx> wrote:
> 
>> On 23/09/2024 11.00, Tian, Kevin wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>>> From: Zhi Wang <zhiw@xxxxxxxxxx>
>>>> Sent: Saturday, September 21, 2024 6:35 AM
>>>>
>>> [...]
>>>> - Create a CXL region and map it to the VM. A mapping between HPA and DPA
>>>> (Device PA) needs to be created to access the device memory directly. HDM
>>>> decoders in the CXL topology need to be configured level by level to
>>>> manage the mapping. After the region is created, it needs to be mapped to
>>>> GPA in the virtual HDM decoders configured by the VM.
>>>
>>> Any time when a new address space is introduced it's worthy of more
>>> context to help people who have no CXL background better understand
>>> the mechanism and think any potential hole.
>>>
>>> At a glance looks we are talking about a mapping tier:
>>>
>>>     GPA->HPA->DPA
>>>
>>> The location/size of HPA/DPA for a cxl region are decided and mapped
>>> at @open_device and the HPA range is mapped to GPA at @mmap.
>>>
>>> In addition the guest also manages a virtual HDM decoder:
>>>
>>>     GPA->vDPA
>>>
>>> Ideally the vDPA range selected by guest is a subset of the physical
>>> cxl region so based on offset and vHDM the VMM may figure out
>>> which offset in the cxl region to be mmaped for the corresponding
>>> GPA (which in the end maps to the desired DPA).
>>>
>>> Is this understanding correct?
>>>
>>
>> Yes. Many thanks to summarize this. It is a design decision from a
>> discussion in the CXL discord channel.
>>
>>> btw is one cxl device only allowed to create one region? If multiple
>>> regions are possible how will they be exposed to the guest?
>>>
>>
>> It is not an (shouldn't be) enforced requirement from the VFIO cxl core.
>> It is really requirement-driven. I am expecting what kind of use cases
>> in reality that needs multiple CXL regions in the host and then passing
>> multiple regions to the guest.
> 
> Mix of back invalidate and non back invalidate supporting device memory
> maybe?  A bounce region for p2p traffic would the obvious reason to do
> this without paying the cost of large snoop filters. If anyone puts PMEM
> on the device, then maybe mix of that at volatile. In theory you might
> do separate regions for QoS reasons but seems unlikely to me...
> 
> Anyhow not an immediately problem as I don't know of any
> BI capable hosts yet and doubt anyone (other than Dan) cares about PMEM :)
> 

Got it.
> 
>>
>> Presumably, the host creates one large CXL region that covers the entire
>> DPA, while QEMU can virtually partition it into different regions and
>> map them to different virtual CXL region if QEMU presents multiple HDM
>> decoders to the guest.
> 
> I'm not sure why it would do that. Can't think why you'd break up
> a host region - maybe I'm missing something.
> 

It is mostly concerning about a device can have multiple HDM decoders. 
In the current design, a large physical CXL (pCXL) region with the whole 
DPA will be passed to the userspace. Thinking that the guest will see 
the virtual multiple HDM decoders, which usually SW is asking for, the 
guest SW might create multiple virtual CXL regions. In that case QEMU 
needs to map them into different regions of the pCXL region.

> ...
> 
>>>> In the L2 guest, a dummy CXL device driver is provided to attach to the
>>>> virtual pass-thru device.
>>>>
>>>> The dummy CXL type-2 device driver can successfully be loaded with the
>>>> kernel cxl core type2 support, create CXL region by requesting the CXL
>>>> core to allocate HPA and DPA and configure the HDM decoders.
>>>
>>> It'd be good to see a real cxl device working to add confidence on
>>> the core design.
>>
>> To leverage the opportunity of F2F discussion in LPC, I proposed this
>> patchset to start the discussion and meanwhile offered an environment
>> for people to try and hack around. Also patches is good base for
>> discussion. We see what we will get. :)
>>
>> There are devices already there and on-going. AMD's SFC (patches are
>> under review) and I think they are going to be the first variant driver
>> that use the core. NVIDIA's device is also coming and NVIDIA's variant
>> driver is going upstream for sure. Plus this emulated device, I assume
>> we will have three in-tree variant drivers talks to the CXL core.
> Nice.
>>
>> Thanks,
>> Zhi.
> 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux