Re: [PATCH v2] vfio/pci: Add OpRegion 2.0 Extended VBT support.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 3 Sep 2021 10:23:44 +0800 (CST)
Colin Xu <colin.xu@xxxxxxxxx> wrote:

> On Thu, 2 Sep 2021, Alex Williamson wrote:
> 
> > On Thu, 2 Sep 2021 15:11:11 +0800 (CST)
> > Colin Xu <colin.xu@xxxxxxxxx> wrote:
> >  
> >> On Mon, 30 Aug 2021, Alex Williamson wrote:
> >>
> >> Thanks Alex for your detailed comments. I replied them inline.
> >>
> >> A general question after these replies is:
> >> which way to handle the readonly OpRegion is preferred?
> >> 1) Shadow (modify the RVDA location and OpRegion version for some
> >> special version, 2.0).
> >> 2) On-the-fly modification for reading.
> >>
> >> The former doesn't need add extra fields to avoid remap on every read, the
> >> latter leaves flexibility for write operation.  
> >
> > I'm in favor of the simplest, most consistent solution.  In retrospect,
> > that probably should have been exposing the VBT as a separate device
> > specific region from the OpRegion and we'd just rely on userspace to do
> > any necessary virtualization before exposing it to a guest.  However,
> > v2.1 support already expanded the region to include the VBT, so we'd
> > have a compatibility problem changing that at this point.
> >
> > Therefore, since we have no plans to enable write support, the simplest
> > solution is probably to shadow all versions.  There's only one instance
> > of this device and firmware tables on the host, so we can probably
> > afford to waste a few pages of memory to simplify.
> >  
> 
> 
> >>> On Fri, 27 Aug 2021 10:37:16 +0800
> >>> Colin Xu <colin.xu@xxxxxxxxx> wrote:
> >>>  
> >>>> Due to historical reason, some legacy shipped system doesn't follow
> >>>> OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended
> >>>> VBT is not contigious after OpRegion in physical address, but any
> >>>> location pointed by RVDA via absolute address. Thus it's impossible
> >>>> to map a contigious range to hold both OpRegion and extended VBT as 2.1.
> >>>>
> >>>> Since the only difference between OpRegion 2.0 and 2.1 is where extended
> >>>> VBT is stored: For 2.0, RVDA is the absolute address of extended VBT
> >>>> while for 2.1, RVDA is the relative address of extended VBT to OpRegion
> >>>> baes, and there is no other difference between OpRegion 2.0 and 2.1,
> >>>> it's feasible to amend OpRegion support for these legacy system (before
> >>>> upgrading the system firmware), by kazlloc a range to shadown OpRegion
> >>>> from the beginning and stitch VBT after closely, patch the shadow
> >>>> OpRegion version from 2.0 to 2.1, and patch the shadow RVDA to relative
> >>>> address. So that from the vfio igd OpRegion r/w ops view, only OpRegion
> >>>> 2.1 is exposed regardless the underneath host OpRegion is 2.0 or 2.1
> >>>> if the extended VBT exists. vfio igd OpRegion r/w ops will return either
> >>>> shadowed data (OpRegion 2.0) or directly from physical address
> >>>> (OpRegion 2.1+) based on host OpRegion version and RVDA/RVDS. The shadow
> >>>> mechanism makes it possible to support legacy systems on the market.
> >>>>
> >>>> V2:
> >>>> Validate RVDA for 2.1+ before increasing total size. (Alex)
> >>>>
> >>>> Cc: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>
> >>>> Cc: Hang Yuan <hang.yuan@xxxxxxxxxxxxxxx>
> >>>> Cc: Swee Yee Fonn <swee.yee.fonn@xxxxxxxxx>
> >>>> Cc: Fred Gao <fred.gao@xxxxxxxxx>
> >>>> Signed-off-by: Colin Xu <colin.xu@xxxxxxxxx>
> >>>> ---
> >>>>  drivers/vfio/pci/vfio_pci_igd.c | 117 ++++++++++++++++++++------------
> >>>>  1 file changed, 75 insertions(+), 42 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
> >>>> index 228df565e9bc..9cd44498b378 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci_igd.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci_igd.c
> >>>> @@ -48,7 +48,10 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
> >>>>  static void vfio_pci_igd_release(struct vfio_pci_device *vdev,
> >>>>  				 struct vfio_pci_region *region)
> >>>>  {
> >>>> -	memunmap(region->data);
> >>>> +	if (is_ioremap_addr(region->data))
> >>>> +		memunmap(region->data);
> >>>> +	else
> >>>> +		kfree(region->data);  
> >>>
> >>>
> >>> Since we don't have write support to the OpRegion, should we always
> >>> allocate a shadow copy to simplify?  Or rather than a shadow copy,
> >>> since we don't support mmap of the region, our read handler could
> >>> virtualize version and rvda on the fly and shift accesses so that the
> >>> VBT appears contiguous.  That might also leave us better positioned for
> >>> handling dynamic changes (ex. does the data change when a monitor is
> >>> plugged/unplugged) and perhaps eventually write support.
> >>>  
> >> Always shadow sounds a more simple solution. On-the-fly offset shifting
> >> may need some extra code:
> >> - A fields to store remapped RVDA, otherwise have to remap on every read.
> >> Should I remap everytime, or add the remapped RVDA in vfio_pci_device.
> >> - Some fields to store extra information, like the old and modified
> >> opregion version. Current it's parsed in init since it's one time run. To
> >> support on-the-fly modification, need save them somewhere instead of parse
> >> on every read.
> >> - Addr shift calculation. Read could called on any start with any size,
> >> will need add some addr shift code.  
> >
> > I think it's a bit easier than made out here.  RVDA is either zero or
> > OPREGION_SIZE when it's virtualized, so the existence of a separate
> > mapping for the VBT is enough to know the value, where I think we'd
> > hold that mapping for the life of the region.  We also don't need to
> > store the version, the transformation is static, If the VBT mapping
> > exists and the read version is 2.0, it's replaced with 2.1, otherwise
> > we leave it alone.  I expect we can also chunk accesses to aligned
> > 1/2/4 byte reads (QEMU is already doing this).  That simplifies both
> > the transition between OpRegion and VBT as well as the field
> > virtualization.
> >  
> emmm version doesn't need to be stored since the host version isn't be 
> changed.

It'd be changed for the guest if we're virtualizing a v2.0 OpRegion
into v2.1 to make RVDA relative, but it's still not strictly necessary
to store the host or virtual version to do that.

> But need a place to store the mapped virtual addr so that can 
> unmap on release. In shadow case, we have the shadow addr, but don't save 
> OpRegion and VBT virtual addr.

Yes, we're using the void* data field of struct vfio_pci_region for
storing the opregion mapping, this could easily point to a structure
containing both the opregion and vbt mappings and size for each.

> > I could almost convince myself that this is viable, but I'd like to see
> > an answer to the question above, is any of the OpRegion or VBT volatile
> > such that we can't rely on a shadow copy exclusively?
> >  
> Most of the fields in OpRegion and VBT are written by BIOS and read only 
> by driver as static information. Some fields are used for communication 
> between BIOS and driver, either written by driver and read by BIOS or 
> vice versa, like driver can notify BIOS that driver is ready to process 
> ACPI video extension calls, or when panel backlight change and BIOS notify 
> driver via ACPI, driver can read PWM duty cycle, etc.
> So strictly speaking, there are some cases that the data is volatile and 
> can't fully rely on the shadow copy. To handle them accurately, all the 
> fields need to be processed according to the actual function the field 
> supports. As you mentioned above, two separate regions for OpRegion 
> and VBT could be better. However currently there is only one region. So 
> the shadow makes it still use single region, but the read ops shouldn't 
> fully rely on the shadow, but need always read host data. That could also 
> make the write ops support in future easier. The read/write ops could 
> parse and filter out some functions that host doesn't want to expose for 
> virtualization.
> This brings a small question: need save the mapped OpRegion and VBT 
> virtual addr so that no need remap every time, and also for unmap on
> release. Which structure is ok to added these saved virtual address?

See above, struct vfio_pci_region.data

> >>>>  }
> >>>>
> >>>>  static const struct vfio_pci_regops vfio_pci_igd_regops = {
> >>>> @@ -59,10 +62,11 @@ static const struct vfio_pci_regops vfio_pci_igd_regops = {
> >>>>  static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
> >>>>  {
> >>>>  	__le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
> >>>> -	u32 addr, size;
> >>>> -	void *base;
> >>>> +	u32 addr, size, rvds = 0;
> >>>> +	void *base, *opregionvbt;  
> >>>
> >>>
> >>> opregionvbt could be scoped within the branch it's used.
> >>>  
> >> Previous revision doesn't move it into the scope. I'll amend in next
> >> version.  
> >>>>  	int ret;
> >>>>  	u16 version;
> >>>> +	u64 rvda = 0;
> >>>>
> >>>>  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
> >>>>  	if (ret)
> >>>> @@ -89,66 +93,95 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
> >>>>  	size *= 1024; /* In KB */
> >>>>
> >>>>  	/*
> >>>> -	 * Support opregion v2.1+
> >>>> -	 * When VBT data exceeds 6KB size and cannot be within mailbox #4, then
> >>>> -	 * the Extended VBT region next to opregion is used to hold the VBT data.
> >>>> -	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
> >>>> -	 * (Raw VBT Data Size) from opregion structure member are used to hold the
> >>>> -	 * address from region base and size of VBT data. RVDA/RVDS are not
> >>>> -	 * defined before opregion 2.0.
> >>>> +	 * OpRegion and VBT:
> >>>> +	 * When VBT data doesn't exceed 6KB, it's stored in Mailbox #4.
> >>>> +	 * When VBT data exceeds 6KB size, Mailbox #4 is no longer large enough
> >>>> +	 * to hold the VBT data, the Extended VBT region is introduced since
> >>>> +	 * OpRegion 2.0 to hold the VBT data. Since OpRegion 2.0, RVDA/RVDS are
> >>>> +	 * introduced to define the extended VBT data location and size.
> >>>> +	 * OpRegion 2.0: RVDA defines the absolute physical address of the
> >>>> +	 *   extended VBT data, RVDS defines the VBT data size.
> >>>> +	 * OpRegion 2.1 and above: RVDA defines the relative address of the
> >>>> +	 *   extended VBT data to OpRegion base, RVDS defines the VBT data size.
> >>>>  	 *
> >>>> -	 * opregion 2.1+: RVDA is unsigned, relative offset from
> >>>> -	 * opregion base, and should point to the end of opregion.
> >>>> -	 * otherwise, exposing to userspace to allow read access to everything between
> >>>> -	 * the OpRegion and VBT is not safe.
> >>>> -	 * RVDS is defined as size in bytes.
> >>>> -	 *
> >>>> -	 * opregion 2.0: rvda is the physical VBT address.
> >>>> -	 * Since rvda is HPA it cannot be directly used in guest.
> >>>> -	 * And it should not be practically available for end user,so it is not supported.
> >>>> +	 * Due to the RVDA difference in OpRegion VBT (also the only diff between
> >>>> +	 * 2.0 and 2.1), while for OpRegion 2.1 and above it's possible to map
> >>>> +	 * a contigious memory to expose OpRegion and VBT r/w via the vfio
> >>>> +	 * region, for OpRegion 2.0 shadow and amendment mechanism is used to
> >>>> +	 * expose OpRegion and VBT r/w properly. So that from r/w ops view, only
> >>>> +	 * OpRegion 2.1 is exposed regardless underneath Region is 2.0 or 2.1.
> >>>>  	 */
> >>>>  	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> >>>> -	if (version >= 0x0200) {
> >>>> -		u64 rvda;
> >>>> -		u32 rvds;
> >>>>
> >>>> +	if (version >= 0x0200) {
> >>>>  		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> >>>>  		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> >>>> +
> >>>> +		/* The extended VBT must follows OpRegion for OpRegion 2.1+ */  
> >>>
> >>>
> >>> Why?  If we're going to make our own OpRegion to account for v2.0, why
> >>> should it not apply to the same scenario for >2.0?  
> >> Below check is to validate the correctness for >2.0. Accroding to spec,
> >> RVDA must equal to OpRegion size. If RVDA doesn't follow spec, the
> >> OpRegion and VBT may already corrupted so returns error here.
> >> For 2.0, RVDA is the absolute address, VBT may or may not follow OpRegion
> >> so these is no such check for 2.0.
> >> If you mean "not apply to the same scenario for >2.0" by "only shadow for
> >> 2.0 and return as 2.1, while not using shadow for >2.0", that's because I
> >> expect to keep the old logic as it is and only change the behavior for
> >> 2.0. Both 2.0 and >2.0 can use shadow mechanism.  
> >
> > I was under the impression that the difference in RVDA between 2.0 and
> > 2.1 was simply the absolute versus relative addressing and we made a
> > conscious decisions here to only support implementations where the VBT
> > is contiguous with the OpRegion, but the spec supported that
> > possibility.  Of course I don't have access to the spec to verify, but
> > if my interpretation is correct then the v2.0 support here could easily
> > handle a non-contiguous v2.1+ VBT as well.
> >  
> The team hasn't release the spec to public so I can't paste it here. What 
> it describes RVDA for 2.1+ is typically RVDA will be equal to OpRegion 
> size only when VBT exceeds 6K (if <6K, mailbox 4 is large enough to hold
> VBT then no need to use RVDA). Technically it's correct that even if it's 
> non-contiguous v2.1+ VBT, it still can be handled.
> Current i915 will handle it even if v2.1+ is not contiguous, so I guess 
> probably it's better to deal with it as i915.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/i915/display/intel_opregion.c?h=v5.13.13#n935

So even a v2.1+ OpRegion where (RVDA > OPREGION_SIZE) should be made
contiguous within this vendor region.  Thanks,

Alex




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux