Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/4/28 14:19, Tian, Kevin wrote:
From: Alex Williamson <alex.williamson@xxxxxxxxxx>
Sent: Saturday, April 27, 2024 4:14 AM

On Fri, 26 Apr 2024 11:11:17 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote:

This is kind of an absurd example to portray as a ubiquitous problem.
Typically the config space layout is a reflection of hardware whether
the device supports migration or not.

Er, all our HW has FW constructed config space. It changes with FW
upgrades. We change it during the life of the product. This has to be
considered..

So as I understand it, the concern is that you have firmware that
supports migration, but it also openly hostile to the fundamental
aspects of exposing a stable device ABI in support of migration.

If a driver were to insert a
virtual capability, then yes it would want to be consistent about it if
it also cares about migration.  If the driver needs to change the
location of a virtual capability, problems will arise, but that's also
not something that every driver needs to do.

Well, mlx5 has to cope with this. It supports so many devices with so
many config space layouts :( I don't know if we can just hard wire an
offset to stick in a PASID cap and expect that to work...

Are those config space layout differences usually also coming with
mmio-side interface change? If yes there are more to handle for
running V1 instance on V2 device and it'd make sense to manage
everything about compatibility in one place.

If we pursue the direction deciding the vconfig layout in VMM, does
it imply that anything related to mmio layout would also be put in
VMM too?

e.g. it's not unusual to see a device mmio layout as:

	REG_BASE:  base addr of a register block in a BAR
	REG_FEAT1_OFF: feature1 regs offset in the register block
	REG_FEAT2_OFF: feature2 regs offset in the register block
	...

Should we also consider the possibility of vBAR address difference between
the src and dst? It should be impossible that the guest driver mmaps vBAR
again if it has already been loaded.

Driver accesses registers according to those read-only offsets.

A FW upgrade may lead to change of offsets but functions stay the
same. An instance created on an old version can be migrated to
the new version as long as accesses to old offsets are trapped
and routed to the new offsets.

Do we envision things like above in the variant driver or in VMM?


What are you actually proposing?

Okay, what I'm thinking about is a text file that describes the vPCI
function configuration space to create. The community will standardize
this and VMMs will have to implement to get PASID/etc. Maybe the
community will provide a BSD licensed library to do this job.

The text file allows the operator to specify exactly the configuration
space the VFIO function should have. It would not be derived
automatically from physical. AFAIK qemu does not have this capability
currently.

This reflects my observation and discussions around the live migration
standardization. I belive we are fast reaching a point where this is
required.

Consider standards based migration between wildly different
devices. The devices will not standardize their physical config space,
but an operator could generate a consistent vPCI config space that
works with all the devices in their fleet.

It's hard to believe that 'wildly different' devices only have difference
in the layout of vPCI config space.


Consider the usual working model of the large operators - they define
instance types with some regularity. But an instance type is fixed in
concrete once it is specified, things like the vPCI config space are
fixed.

Running Instance A on newer hardware with a changed physical config
space should continue to present Instance A's vPCI config layout
regardless. Ie Instance A might not support PASID but Instance B can
run on newer HW that does. The config space layout depends on the
requested Instance Type, not the physical layout.

The auto-configuration of the config layout from physical is a nice
feature and is excellent for development/small scale, but it shouldn't
be the only way to work.

So - if we accept that text file configuration should be something the
VMM supports then let's reconsider how to solve the PASID problem.

I'd say the way to solve it should be via a text file specifying a
full config space layout that includes the PASID cap. From the VMM
perspective this works fine, and it ports to every VMM directly via
processing the text file.

The autoconfiguration use case can be done by making a tool build the
text file by deriving it from physical, much like today. The single
instance of that tool could have device specific knowledge to avoid
quirks. This way the smarts can still be shared by all the VMMs
without going into the kernel. Special devices with hidden config
space could get special quirks or special reference text files into
the tool repo.

Serious operators doing production SRIOV/etc would negotiate the text
file with the HW vendors when they define their Instance Type. Ideally
these reference text files would be contributed to the tool repo
above. I think there would be some nice idea to define fully open
source Instance Types that include VFIO devices too.

Regarding "if we accept that text file configuration should be
something the VMM supports", I'm not on board with this yet, so
applying it to PASID discussion seems premature.

We've developed variant drivers specifically to host the device specific
aspects of migration support.  The requirement of a consistent config
space layout is a problem that only exists relative to migration.  This
is an issue that I would have considered the responsibility of the
variant driver, which would likely expect a consistent interface from
the hardware/firmware.  Why does hostile firmware suddenly make it the
VMM's problem to provide a consistent ABI to the config space of the
device rather than the variant driver?

Obviously config maps are something that a VMM could do, but it also
seems to impose a non-trivial burden that every VMM requires an
implementation of a config space map and integration for each device
rather than simply expecting the exposed config space of the device to
be part of the migration ABI.  Also this solution specifically only
addresses config space compatibility without considering the more
generic issue that a variant driver can expose different device
personas.  A versioned persona and config space virtualization in the
variant driver is a much more flexible solution.  Thanks,


and looks this community lacks of a clear criteria on what burden
should be put in the kernel vs. in the VMM.

e.g. in earlier nvgrace-gpu discussion a major open was whether
the PCI bar emulation should be done by the variant driver or
by the VMM (with variant driver providing a device feature).

It ends up to be in the variant driver with one major argument
that doing so avoids the burden in various VMMs.

But now seems the 'text-file' proposal heads the opposite direction?

btw while this discussion may continue some time, I wonder whether
this vPASID reporting open can be handled separately from the
pasid attach/detach series so we can move the ball and merge
something already in agreement. anyway it's just a read-only cap so
won't affect how VFIO/IOMMUFD handles the pasid related requests.

Thanks
Kevin


--
Regards,
Yi Liu




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux