On 14.11.2023 14:22, Vivi, Rodrigo wrote: > On Tue, 2023-11-14 at 12:37 +0000, Tvrtko Ursulin wrote: >> >> On 10/11/2023 18:22, Michal Wajdeczko wrote: >>> The Single Root I/O Virtualization (SR-IOV) extension to the PCI >>> Express (PCIe) specification suite is supported starting from 12th >>> generation of Intel Graphics processors. >>> >>> This RFC aims to explain how do we want to add support for SR-IOV >>> to the new Xe driver and to propose related additions to the sysfs. >>> >>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> >>> Cc: Oded Gabbay <ogabbay@xxxxxxxxxx> >>> Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> >>> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> >>> Cc: Daniel Vetter <daniel@xxxxxxxx> >>> --- >>> Documentation/gpu/rfc/index.rst | 5 + >>> Documentation/gpu/rfc/sysfs-driver-xe-sriov | 501 >>> ++++++++++++++++++++ >>> Documentation/gpu/rfc/xe_sriov.rst | 192 ++++++++ >>> 3 files changed, 698 insertions(+) >>> create mode 100644 Documentation/gpu/rfc/sysfs-driver-xe-sriov >>> create mode 100644 Documentation/gpu/rfc/xe_sriov.rst >>> >>> diff --git a/Documentation/gpu/rfc/index.rst >>> b/Documentation/gpu/rfc/index.rst >>> index e4f7b005138d..fc5bc447f30d 100644 >>> --- a/Documentation/gpu/rfc/index.rst >>> +++ b/Documentation/gpu/rfc/index.rst >>> @@ -35,3 +35,8 @@ host such documentation: >>> .. toctree:: >>> >>> xe.rst >>> + >>> +.. toctree:: >>> + :maxdepth: 1 >>> + >>> + xe_sriov.rst >>> diff --git a/Documentation/gpu/rfc/sysfs-driver-xe-sriov >>> b/Documentation/gpu/rfc/sysfs-driver-xe-sriov >>> new file mode 100644 >>> index 000000000000..77748204dd83 >>> --- /dev/null >>> +++ b/Documentation/gpu/rfc/sysfs-driver-xe-sriov >>> @@ -0,0 +1,501 @@ >>> +.. Documentation/ABI/testing/sysfs-driver-xe-sriov >>> +.. >>> +.. Intel Xe driver ABI (SR-IOV extensions) >>> +.. >>> + The Single Root I/O Virtualization (SR-IOV) extension to >>> + the PCI Express (PCIe) specification suite is supported >>> + starting from 12th generation of Intel Graphics processors. >>> + >>> + This document describes Xe driver specific additions. >>> + >>> + For description of generic SR-IOV sysfs attributes see >>> + "Documentation/ABI/testing/sysfs-bus-pci" document. >>> + >>> + /sys/bus/pci/drivers/xe/BDF/ >>> + ├── sriov_auto_provisioning >>> + │ ├── admin_mode >>> + │ ├── enabled >>> + │ ├── reset_defaults >>> + │ ├── resources >>> + │ │ ├── default_contexts_quota >>> + │ │ ├── default_doorbells_quota >>> + │ │ ├── default_ggtt_quota >>> + │ │ └── default_lmem_quota >>> + │ ├── scheduling >>> + │ │ ├── default_exec_quantum_ms >>> + │ │ └── default_preempt_timeout_us >>> + │ └── monitoring >>> + │ ├── default_cat_error_count >>> + │ ├── default_doorbell_time_us >>> + │ ├── default_engine_reset_count >>> + │ ├── default_h2g_time_us >>> + │ ├── default_irq_time_us >>> + │ └── default_page_fault_count >> >> From the department of bike-shedding, one alternative could be to >> have >> a directory called defaults which avoids having to have the default_ >> prefix on everything under it. > > good idea. probably with a '.' prefix to make it hidden like we have > for other stuff already. > > '.defaults' but then we will be inconsistent as in this 'other stuff' we use ".defaults" directory to hold RO attributes with min/default/max values, while here we wanted to define RW attributes that will be applied to VFs maybe ".template" instead ? > >> >>> + >>> + /sys/bus/pci/drivers/xe/BDF/ >>> + ├── sriov_extensions >> >> Should this be xe_sriov_extensions or if not doesn't it need >> agreement >> to reserve the keyword in Documentation/ABI/testing/sysfs-bus-pci? >> Sriov_auto_provisioning too I guess. >> >>> + │ ├── monitoring_period_ms >>> + │ ├── strict_scheduling_enabled >>> + │ ├── pf >>> + │ │ ├── device -> ../../../BDF >>> + │ │ ├── priority >>> + │ │ ├── tile0 >>> + │ │ │ ├── gt0 >>> + │ │ │ │ ├── exec_quantum_ms >>> + │ │ │ │ ├── preempt_timeout_us >>> + │ │ │ │ └── thresholds >>> + │ │ │ │ ├── cat_error_count >>> + │ │ │ │ ├── doorbell_time_us >>> + │ │ │ │ ├── engine_reset_count >>> + │ │ │ │ ├── h2g_time_us >>> + │ │ │ │ ├── irq_time_us >>> + │ │ │ │ └── page_fault_count >>> + │ │ │ └── gtX >>> + │ │ └── tileT >>> + │ ├── vf1 >>> + │ │ ├── device -> ../../../BDF+1 >>> + │ │ ├── stop >>> + │ │ ├── tile0 >>> + │ │ │ ├── ggtt_quota >>> + │ │ │ ├── lmem_quota >>> + │ │ │ ├── gt0 >>> + │ │ │ │ ├── contexts_quota >>> + │ │ │ │ ├── doorbells_quota >>> + │ │ │ │ ├── exec_quantum_ms >>> + │ │ │ │ ├── preempt_timeout_us >>> + │ │ │ │ └── thresholds >>> + │ │ │ │ ├── cat_error_count >>> + │ │ │ │ ├── doorbell_time_us >>> + │ │ │ │ ├── engine_reset_count >>> + │ │ │ │ ├── h2g_time_us >>> + │ │ │ │ ├── irq_time_us >>> + │ │ │ │ └── page_fault_count >>> + │ │ │ └── gtX >>> + │ │ └── tileT >>> + │ └── vfN >>> +.. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> / >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory appears on the device when: >>> + >>> + - device supports SR-IOV, and >>> + - device is a Physical Function (PF), and >>> + - xe driver supports SR-IOV PF on given device, >>> and >>> + - xe driver supports automatic VFs provisioning. >>> + >>> + This directory is used as a root for all attributes >>> related to >>> + automatic provisioning of SR-IOV Physical Function >>> (PF) and/or >>> + Virtual Functions (VFs). >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /enabled >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (RW) bool (0, 1) >>> + >>> + This file represents configuration flag for the >>> automatic VFs >>> + (un)provisioning that could be performed by the PF. >>> + >>> + The default value is 1 (true). >>> + >>> + This flag can be set to false, unless manual >>> provisioning is not >>> + applicable for given platform or it is not >>> supported by current >>> + PF implementation. In such cases -EPERM will be >>> returned. >>> + >>> + This flag will be automatically set to false when >>> there will be >>> + other attempts to change any of VF's resource >>> provisioning. >>> + See "sriov_extensions" section for details. >>> + >>> + This flag can be set back to true if and only if >>> all VFs are >>> + fully unprovisioned, otherwise -EEXIST error will >>> be returned. >>> + >>> + false = "disabled" >>> + When disabled, then PF will not attempt to >>> do automatic >>> + VFs provisioning when VFs are being enabled >>> and will not >>> + perform automatic unprovisioning of the VFs >>> when VFs will >>> + be disabled. >>> + >>> + true = "enabled" >>> + When enabled, then on VFs enabling PF will >>> do automatic >>> + VFs provisioning based on the default >>> settings described >>> + below. >>> + >>> + If automatic VFs provisioning fails due to >>> some reasons, >>> + then VFs will not be enabled. >>> + >>> + If enabled, all resources allocated during >>> VFs enabling >>> + will be released during VFs disabling >>> (automatic unprovisioning). >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /admin_mode >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (RW) bool (0, 1) >>> + >>> + This file represents configuration flag for the >>> automatic VFs >>> + provisioning that could be performed by the PF. >>> + >>> + The default value depends on the platform type. >>> + >>> + This flag can be changed any time, but will have no >>> effect if >>> + VFs are already provisioned. >>> + >>> + If enabled (default on discrete platforms) then the >>> PF will >>> + retain only minimum hardcoded resources for its own >>> use when >>> + doing VFs automatic provisioning and will not use >>> any default >>> + values described below for its own configuration. >>> + >>> + If disabled (default on integrated platforms) then >>> the PF will >>> + treat itself like yet another additional VF in all >>> fair resource >>> + allocations and will also try to apply default >>> provisioning >>> + values described below for its own configuration. >>> + >> >> One alternative could be to expose two sets of defaults, the PF and >> VF >> ones. With the advantage of allowing the "admin mode" / "minimal PF" >> to >> be explicitly configurable instead of hardcoded. Should be more >> flexible. >> >> If the discrete vs integrated distinction is wanted it could simply >> be >> made by initialy populating (driver init) the respective defaults >> based >> on the platform type. >> >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /reset_defaults >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (WO) bool (1) >>> + >>> + Writing to this file will reset all default >>> provisioning parameters >>> + listed below to the default values. >> >> Maybe this isn't required if you can say it is the responsibility of >> whoever changes the defaults to either know what they are doing, or >> to >> save and restore themselves if they. It is not a major concern but if >> writing kernel code can be avoided perhaps it can be considered. >> >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /resources/default_contexts_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /resources/default_doorbells_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /resources/default_ggtt_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /resources/default_lmem_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /scheduling/default_exec_quantum_ms >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /scheduling/default_preempt_timeout_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_cat_error_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_doorbell_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_engine_reset_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_h2g_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_irq_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning >>> /monitoring/default_page_fault_count >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + These files represent default provisioning that >>> should be used >>> + for VFs automatic provisioning. >>> + >>> + These values can be changed any time, but will have >>> no effect if >>> + VFs are already provisioned. >>> + >>> + default_contexts_quota: (RW) integer 0..U32_MAX >>> + The number of GuC context IDs to provide to >>> the VF. >>> + The default value is 0 (use fair >>> allocations). >>> + See >>> "sriov_extensions/vfN/tileT/gtX/contexts_quota" for details. >>> + >>> + default_doorbells_quota: (RW) integer 0..U32_MAX >>> + The number of GuC doorbells to provide to >>> the VF. >>> + The default value is 0 (use fair >>> allocations). >>> + See >>> "sriov_extensions/vfN/tileT/gtX/doorbells_quota" for details. >>> + >>> + default_ggtt_quota: (RW) integer 0..U32_MAX >>> + The size of the GGTT address space (in >>> bytes) to provide to the VF. >>> + The default value is 0 (use fair >>> allocations). >>> + See "sriov_extensions/vfN/tileT/ggtt_quota" >>> for details. >>> + >>> + default_lmem_quota: (RW) integer 0..U32_MAX >>> + The size of the LMEM (in bytes) to provide >>> to the VF. >>> + The default value is 0 (use fair >>> allocations). >>> + See "sriov_extensions/vfN/tileT/lmem_quota" >>> for details. >>> + >>> + default_exec_quantum_ms: (RW) integer 0..U32_MAX >>> + The GT execution quantum (in millisecs) >>> assigned to the function. >>> + The default value is 0 (infinify). >>> + See >>> "sriov_extensions/vfN/tileT/gtX/exec_quantum_ms" for details. >>> + >>> + default_preempt_timeout_us: (RW) integer 0..U32_MAX >>> + The GT preemption timeout (in microsecs) >>> assigned to the function. >>> + The default value is 0 (infinity). >>> + See >>> "sriov_extensions/vfN/tileT/gtX/preempt_timeout_us" for details. >> >> I have a slight concern here on the usability of GuC specific >> tunables. >> >> Whereas one can imagine an external entity (some admin, somewhere) to >> probably pretty much understand what it means to partition the local >> memory, address space, and set the scheduling timeouts (all are >> intuitve >> and obvious concepts), how are they suppose to approach the GuC >> doorbells and contexts? >> >> It could be a matter of adding more documentation for those two, or >> it >> even could make sense to shove them under a guc prefix (or >> subdirectory?) to signify the fact they are implementation details >> and >> not a fundamental concept. >> >>> + >>> + default_cat_error_count: (RW) integer 0..U32_MAX >>> + default_doorbell_time_us: (RW) integer 0..U32_MAX >>> + default_engine_reset_count: (RW) integer 0..U32_MAX >>> + default_h2g_time_us: (RW) integer 0..U32_MAX >>> + default_irq_time_us: (RW) integer 0..U32_MAX >>> + default_page_fault_count: (RW) integer 0..U32_MAX >>> + The monitoring threshold to be set for the >>> function. >>> + The default value is 0 (don't monitor). >>> + See >>> "sriov_extensions/vfN/tileT/gtX/thresholds" for details. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/ >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory appears on Xe device when: >>> + >>> + - device supports SR-IOV, and >>> + - device is a Physical Function (PF), and >>> + - driver is enabled to support SR-IOV PF on given >>> device. >>> + >>> + This directory is used as a root for all attributes >>> required to >>> + manage both Physical Function (PF) and Virtual >>> Functions (VFs). >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/strict >>> _scheduling_enabled >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (RW) bool >>> + >>> + This file represents a flag used to determine if >>> scheduling >>> + parameters should be respected even if there is no >>> active >>> + workloads submitted by the PF or VFs. >>> + >>> + This flag is disabled by default, unless strict >>> scheduling is >>> + not applicable on given platform. In such case this >>> file will >>> + be read-only. >>> + >>> + The change to this file may have no effect if VFs >>> are not yet enabled. >>> + If strict scheduling can't be enabled in GuC then >>> write will fail with -EIO. >> >> I think the semantics of this need to be documented ie. how it >> interacts >> with exec_quantum_ms. If it does? I am guessing that it has to >> otherwise >> I don't know what it would mean - presumably unused timeslices are >> not >> given to other entities but time just goes wasted? But it is also a >> question on over what time interval. Or that too is purely defined by >> the number of PF+VFs and their respective allocated quanta. >> >> Also, would there be benefit, assuming it is possible with GuC, to >> allow >> configuring it per PF/VF? >> >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/monito >>> ring_period_ms >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (RW) integer >>> + >>> + This file represents the configuration knob used by >>> adverse event >>> + monitoring. A value here is the period in millisecs >>> during which >>> + events are counted and the total is checked against >>> a threshold. >>> + See "sriov_extensions/vfN/tileT/gtX/thresholds" for >>> more details. >>> + >>> + Default is 0 (monitoring is disabled). >>> + >>> + If monitoring capability is not available, then >>> attempt to enable >>> + will fail with -EPERM error. If monitoring can't be >>> enabled in >>> + GuC then write will fail with -EIO. >> >> Could the docs explain if there is a downside to enabling it, which >> is >> probably why it isn't enabled by default? Because it does sound >> natural >> that adverse events should be noticed. >> >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/ >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory holds all attributes related to the >>> SR-IOV >>> + Physical Function (PF). >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory holds all attributes related to the >>> SR-IOV >>> + Virtual Function (VF). >>> + >>> + Note that VF numbers (N) are 1-based as described >>> in PCI SR-IOV specification. >>> + The Xe driver implementaton follows that naming >>> schema. >>> + >>> + There will be "vf1", "vf2" up to "vfN" directories, >>> where N matches >>> + value of the PCI "sriov_totalvfs" attribute. >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/ >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/ >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory holds all SR-IOV attributes related >>> to the device tile. >>> + The tile numbers (T) start from 0. >>> + >>> + There is at least one "tile0/" directory present. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + This directory holds all SR-IOV attributes related >>> to the device GT. >>> + The GT numbers (X) start from 0. >>> + >>> + There is at least one "gt0/" directory present. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/dev >>> ice >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/de >>> vice >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (symbolic link) >>> + >>> + Backlink to the PCI device entry representing given >>> function. >>> + For PF this link is always present. >>> + For VF this link is present only for currently >>> enabled VFs. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/pri >>> ority >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (RW) string >>> + >>> + This file represents a GuC Scheduler knob to >>> override the default >>> + round-robin or FIFO scheduler policies implemented >>> by the GuC. >>> + >>> + The default value is "peer". >>> + >>> + This flag can be changed, unless such change is not >>> applicable >>> + for given platform or is not supported by current >>> GuC firmware. >>> + In such case this file could be read-only or will >>> return -EPERM >>> + on write attempt. >>> + >>> + "immediate" >>> + GuC will Schedule PF workloads immediately >>> and PF >>> + workloads only until the PF's work queues >>> in GuC >>> + are empty. >>> + >>> + "lazy" >>> + GuC will Schedule PF workloads at the next >>> opportune >>> + moment and PF workloads only until the PF >>> work queues >>> + in GuC are empty. >>> + >>> + "peer" >>> + GuC Scheduler will treat PF and VFs with >>> equal priority. >> >> Hmmm this is too very GuC specific and I wonder what is the usecase >> for >> lazy? Lazy = "don't care when it runs, but when it runs it will run >> everything queued so far", right? Feels a bit odd on first. >> >> "Immediate" may also not be depending on preemption granularity and >> workloads, right? >> >> Are there any ideas to express the knobs in a more generic fashion? >> >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/st >>> op >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + (WO) bool (1) >>> + >>> + Write to this file will force GuC to stop handle >>> any requests from >>> + this VF, but without triggering a FLR. >>> + To recover, the full FLR must be issued using >>> generic "device/reset". >>> + >>> + This file allows to implement custom policy >>> mechanism when VF is >>> + misbehaving and triggering adverse events above >>> defined thresholds. >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/exec_quantum_ms >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/preempt_timeout_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/exec_quantum_ms >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/preempt_timeout_us >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + These files represent scheduling parameters of the >>> functions. >>> + >>> + These scheduling parameters can be changed even if >>> VFs are enabled >>> + and running, unless such change is not applicable >>> on given platform >>> + due to fixed hardware or firmware assignment. >>> + >>> + exec_quantum_ms: (RW) integer 0..U32_MAX >>> + The GT execution quantum in [ms] assigned >>> to the function. >>> + Requested quantum might be aligned per >>> HW/FW requirements. >>> + >>> + Default is 0 (unlimited). >>> + >>> + preempt_timeout_us: (RW) integer 0..U32_MAX >>> + The GT preemption timeout in [us] assigned >>> to the function. >>> + Requested timeout might be aligned per >>> HW/FW requirements. >>> + >>> + Default is 0 (unlimited). >> >> Alignment for the above two will be visible after read-back? >> >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/ggtt_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/lmem_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/contexts_quota >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/doorbells_quota >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + These files represent shared resource assigned to >>> the functions. >>> + >>> + These resource parameters can be changed, unless VF >>> is already running, >>> + or such change is not applicable on given platform >>> due to fixed hardware >>> + or firmware assignment. >>> + >>> + Writes to these attributes may fail with: >>> + -EPERM if change is not applicable on give >>> HW/FW. >>> + -E2BIG if value larger that HW/FW limit. >>> + -EDQUOT if value is larger than maximum >>> quota defined by the PF. >>> + -ENOSPC if PF can't allocate required >>> quota. >>> + -EBUSY if the resource is currently in use >>> by the VF. >>> + -EIO if GuC refuses to change provisioning. >> >> Why it would refuse if input is valid? In other words, what is the >> user/admin supposed to do on -EIO? >> >>> + >>> + ggtt_quota: (RW) integer 0..U64_MAX >>> + The size of the GGTT address space (in >>> bytes) assigned to the VF. >>> + The value might be aligned per HW/FW >>> requirements. >>> + >>> + Default is 0 (unprovisioned). >>> + >>> + lmem_quota: (RW) integer 0..U64_MAX >>> + The size of the Local Memory (in bytes) >>> assigned to the VF. >>> + The value might be aligned per HW/FW >>> requirements. >>> + >>> + This attribute is only available on >>> discrete platforms. >>> + >>> + Default is 0 (unprovisioned). >>> + >>> + contexts_quota: (RW) 0..U16_MAX >>> + The number of GuC submission contexts >>> assigned to the VF. >>> + This value might be aligned per HW/FW >>> requirements. >>> + >>> + Default is 0 (unprovisioned). >>> + >>> + doorbells_quota: (RW) 0..U16_MAX >>> + The number of GuC doorbells assigned to the >>> VF. >>> + This value might be aligned per HW/FW >>> requirements. >>> + >>> + Default is 0 (unprovisioned). >>> + >>> + >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/cat_error_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/doorbell_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/engine_reset_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/h2g_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/irq_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til >>> eT/gtX/thresholds/page_fault_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/cat_error_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/doorbell_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/engine_reset_count >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/h2g_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/irq_time_us >>> +What: /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti >>> leT/gtX/thresholds/page_fault_count >>> +Date: 2024 >>> +KernelVersion: TBD >>> +Contact: intel-xe@xxxxxxxxxxxxxxxxxxxxx >>> +Description: >>> + These files represent threshold values used by the >>> GuC to trigger >>> + security events if adverse event monitoring is >>> enabled. >> >> How are the security events delivered? There is mention of uevents in >> a >> later paragraph - are they already defined or should be together with >> this so the link can be place here? >> >>> + >>> + These thresholds are checked every >>> "monitoring_period_ms". >>> + Refer to GuC ABI for details about each threshold >>> category. >> >> Is it possible to have a link here to GuC ABI? >> >> Regards, >> >> Tvrtko >> >>> + >>> + Default value for all thresholds is 0 (disabled). >>> + >>> + cat_error_count: (RW) integer >>> + doorbell_time_us: (RW) integer >>> + engine_reset_count: (RW) integer >>> + h2g_time_us: (RW) integer >>> + irq_time_us: (RW) integer >>> + page_fault_count: (RW) integer >>> diff --git a/Documentation/gpu/rfc/xe_sriov.rst >>> b/Documentation/gpu/rfc/xe_sriov.rst >>> new file mode 100644 >>> index 000000000000..574f6414eabb >>> --- /dev/null >>> +++ b/Documentation/gpu/rfc/xe_sriov.rst >>> @@ -0,0 +1,192 @@ >>> +.. SPDX-License-Identifier: MIT >>> + >>> +======================== >>> +Xe – SR-IOV Support Plan >>> +======================== >>> + >>> +The Single Root I/O Virtualization (SR-IOV) extension to the PCI >>> Express (PCIe) >>> +specification suite is supported starting from 12th generation of >>> Intel Graphics >>> +processors. >>> + >>> +This document describes planned ABI of the new Xe driver (see >>> xe.rst) that will >>> +provide flexible configuration and management options related to >>> the SR-IOV. >>> +It will also highlight few most important changes to the Xe driver >>> +implementation to deal with Intel GPU SR-IOV specific >>> requirements. >>> + >>> + >>> +SR-IOV Capability >>> +================= >>> + >>> +Due to SR-IOV complexity and required co-operation between >>> hardware, firmware >>> +and kernel drivers, not all Xe architecture platforms might have >>> SR-IOV enabled >>> +or fully functional. >>> + >>> +To control at the driver level which platform will provide support >>> for SR-IOV, >>> +as we can't just rely on the PCI configuration data exposed by the >>> hardware, >>> +we will introduce "has_sriov" flag to the struct xe_device_desc >>> that describes >>> +a device capabilities that driver checks during the probe. >>> + >>> +Initially this flag will be set to disabled even on platforms that >>> we plan to >>> +support. We will enable this flag only once we finish merging all >>> required >>> +changes to the driver and related validated firmwares are also >>> made available. >>> + >>> + >>> +SR-IOV Platforms >>> +================ >>> + >>> +Initially we plan to add SR-IOV functionality to the following SDV >>> platforms >>> +already supported by the Xe driver: >>> + >>> + - TGL (up to 7 VFs) >>> + - ADL (up to 7 VFs) >>> + - MTL (up to 7 VFs) >>> + - ATSM (up to 31 VFs) >>> + - PVC (up to 63 VFs) >>> + >>> +Newer platforms will be supported later, but we hope that enabling >>> will be >>> +much faster, as majority of the driver changes are either platform >>> agnostic >>> +or are similar between earlier platforms (hence we start with >>> SDVs). >>> + >>> + >>> +PF Mode >>> +======= >>> + >>> +Support in the driver for acting in Physical Function (PF) mode, >>> i.e. mode >>> +that allows configuration of VFs, depends on the CONFIG_PCI_IOV >>> and will be >>> +enabled by default. >>> + >>> +However, due to potentially conflicting requirements for SR-IOV >>> and other mega >>> +features, we might want to have an option to disable SR-IOV PF >>> mode support at >>> +the driver load time. >>> + >>> +Thus, we plan to use additional modparam named "sriov_totalvfs" >>> which if set to >>> +0 will force the driver to operate in the native (non-virtualized) >>> mode. >>> +The same modparam could be used to limit number of supported >>> Virtual Functions >>> +(VFs) by the driver compared to the hardware limit exposed in PCI >>> configuration. >>> + >>> +The name of this modparam corresponds to the existing PCI sysfs >>> attribute, that >>> +by default exposes hardware capability. >>> + >>> +The default value of this param will allow to support all possible >>> VFs as >>> +claimed by the hardware. >>> + >>> +This modparam will have no effect if driver is running on the VF >>> device. >>> + >>> + >>> +VFs Enabling >>> +============ >>> + >>> +To enable or disable VFs we plan to rely on existing sysfs >>> attribute exposed by >>> +the PCI subsystem named "sriov_numvfs". We will provide all >>> necessary tweaks to >>> +provision VFs in our custom implementation of the >>> "sriov_configure" hook from >>> +the struct pci_driver. >>> + >>> +If for some reason, including explicit request to disable SR-IOV >>> PF mode using >>> +modparam, we will not be able to correctly support any VFs, driver >>> will change >>> +number of supported VFs, exposed to the userspace by >>> "sriov_totalvfs" attribute, >>> +to 0, thus preventing configuration of the VFs. >>> + >>> + >>> +VF Mode >>> +======= >>> + >>> +When driver is running on the VF device, then due to hardware >>> enforcements, >>> +access to the privileged registers is not possible. To avoid >>> relying on these >>> +registers, we plan to perform early detection if we are running on >>> the VF >>> +device using dedicated VF_CAP(0x1901f8) register and then use >>> global macro >>> +IS_SRIOV_VF(xe) to control the driver logic. >>> + >>> +To speed up merging of the required changes, we might first >>> introduce dummy >>> +macro that is always set to false, to prepare driver to avoid some >>> code paths >>> +before we finalize our VF mode detection and other VFs enabling >>> changes. >>> + >>> + >>> +Resources >>> +========= >>> + >>> +Most of the hardware (or firmware) resources available on the Xe >>> architecture, >>> +like GGTT, LMEM, GuC context IDs, GuC doorbells, will be shared >>> between PF and >>> +VFs and will require some provisioning steps to assign those >>> resources for use >>> +by the VF. >>> + >>> +Until VFs are provisioned with resources, the PF driver will be >>> able to use all >>> +resources, in the same way as it would be running in non- >>> virtualized mode. >>> + >>> +If some resource (of part or region of it) is assigned to specific >>> VF, then PF >>> +is not allowed to use that part or region of the resource, but can >>> continue to >>> +use whatever is left available. >>> + >>> +Those resources are usually fully virtualized, so they will not >>> require any >>> +special handling when used by the VF driver, except that VF driver >>> must know >>> +the assigned quota. >>> + >>> +The most notable exception is the GGTT address space, as on some >>> platforms, >>> +the VF driver must additionally know the real range that it can >>> access. >>> + >>> +Once the resources were assigned to the VF use and the VF driver >>> has started, >>> +then it is not allowed to change such provisioning, as that would >>> break the >>> +VF driver. To make changes the VF driver, which was using these >>> resources, >>> +must be unloaded (or the VM is terminated) and the VF device must >>> be reset >>> +using the FLR. >>> + >>> + >>> +Scheduling >>> +========== >>> + >>> +The workloads from PF driver and VF drivers must be submitted to >>> the hardware >>> +always by using the GuC submission mechanism. Unless VF has >>> exclusive access >>> +to the GT then submissions from different VFs are time-sliced and >>> controlled >>> +with additional "execution_quantum" and "preemption_timeout" >>> parameters. >>> + >>> +In contrast to the resource provisioning, those scheduling >>> parameters can be >>> +changed even if VF drivers are already running and are active. >>> + >>> + >>> +Automatic VFs Provisioning >>> +========================== >>> + >>> +To provide out-of-the box experience when user will be enabling >>> VFs using >>> +generic "sriov_numvfs" attribute without requiring complex >>> provisioning steps, >>> +the SR-IOV PF driver will implement automatic VFs resource >>> provisioning. >>> + >>> +By default, all VFs will be allocated with the fair amount of the >>> mandatory >>> +resources (like GGTT, GuC IDs) and with unrestricted scheduling >>> parameters. >>> +Such provisioning should be sufficient for most of the normal >>> usages, when >>> +no strict SLA is required. >>> + >>> +The PF driver will also expose some additional sysfs files to >>> allow adjusting >>> +this automatic VFs provisioning, like default values for most of >>> the >>> +provisioning parameters that PF will then apply for each enabled >>> VF. >>> + >>> + Details about those extension can be found in >>> + :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`. >>> + >>> + >>> +Manual VFs Provisioning >>> +======================= >>> + >>> +If automatic VFs provisioning, which applies same configuration to >>> every VF, >>> +is not sufficient or there is a need for advanced customization of >>> some VF, >>> +the PF driver will also provide extended sysfs interface which >>> will allow >>> +control every provisioning attribute to the lowest feasible level. >>> + >>> +It is expected that these low-level attributes will be mostly used >>> by the >>> +advanced users or by the custom tools that will setup >>> configurations that >>> +meet predefined and validated SLA as required by the customers. >>> + >>> + Details about those extension can be found in >>> + :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`. >>> + >>> + >>> +VFs Monitoring >>> +============== >>> + >>> +In addition to the resource provisioning or changing scheduling >>> parameters, >>> +the PF driver might also allow configure some monitoring >>> parameters, like >>> +thresholds of adverse events or sample period, to track undesired >>> behavior >>> +of the VFs that could impact the whole system. >>> + >>> +Once those thresholds are setup and sampling period is defined, >>> the GuC will >>> +notify the PF driver about which VF is excessing the threshold and >>> then PF is >>> +able to trigger the uevent to notify the administrator (or VMM) >>> that could >>> +take some action against the VF. >