Re: [PATCH 13/16] iommupt: Add the x86 PAE page table format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 15, 2024, Jason Gunthorpe wrote:
> This is used by x86 CPUs and can be used in both x86 IOMMUs. When the x86
> IOMMU is running SVA it is using this page table format.
> 
> This implementation follows the AMD v2 io-pgtable version.
> 
> There is nothing remarkable here, the format has a variable top and
> limited support for different page sizes and no contiguous pages support.
> 
> In principle this can support the 32 bit configuration with fewer table
> levels.

What's "the 32 bit configuration"?

> FIXME: Compare the bits against the VT-D version too.
> 
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> ---
>  drivers/iommu/generic_pt/Kconfig            |   6 +
>  drivers/iommu/generic_pt/fmt/Makefile       |   2 +
>  drivers/iommu/generic_pt/fmt/defs_x86pae.h  |  21 ++
>  drivers/iommu/generic_pt/fmt/iommu_x86pae.c |   8 +
>  drivers/iommu/generic_pt/fmt/x86pae.h       | 283 ++++++++++++++++++++
>  include/linux/generic_pt/common.h           |   4 +
>  include/linux/generic_pt/iommu.h            |  12 +
>  7 files changed, 336 insertions(+)
>  create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c
>  create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h
> 
> diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig
> index e34be10cf8bac2..a7c006234fc218 100644
> --- a/drivers/iommu/generic_pt/Kconfig
> +++ b/drivers/iommu/generic_pt/Kconfig
> @@ -70,6 +70,11 @@ config IOMMU_PT_ARMV8_64K
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_PT_X86PAE
> +       tristate "IOMMU page table for x86 PAE"
> +#include "iommu_template.h"
> diff --git a/drivers/iommu/generic_pt/fmt/x86pae.h b/drivers/iommu/generic_pt/fmt/x86pae.h
> new file mode 100644
> index 00000000000000..9e0ee74275fcb3
> --- /dev/null
> +++ b/drivers/iommu/generic_pt/fmt/x86pae.h
> @@ -0,0 +1,283 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
> + *
> + * x86 PAE page table
> + *
> + * This is described in
> + *   Section "4.4 PAE Paging" of the Intel Software Developer's Manual Volume 3

I highly doubt what's implemented here is actually PAE paging, as the SDM (that
is referenced above) and most x86 folks describe PAE paging.  PAE paging is
specifically used when the CPU is in 32-bit mode (NOT including compatibility mode!).

  PAE paging translates 32-bit linear addresses to 52-bit physical addresses.

Presumably what's implemented here is what Intel calls 4-level and 5-level paging.
Those are _really_ similar to PAE paging, e.g. have the same encodings for bits
11:0, and even require CR4.PAE=1, but they aren't 100% identical.  E.g. true PAE
paging doesn't have software-available bits in 62:MAXPHYADDR.

Unfortuntately, I have no idea what name to use for this flavor.  x86pae is
actually kinda good, but I think it'll be confusing to people that are familiar
with the more canonical version of PAE paging.

> + *   Section "2.2.6 I/O Page Tables for Guest Translations" of the "AMD I/O
> + *   Virtualization Technology (IOMMU) Specification"
> + *
> + * It is used by x86 CPUs and The AMD and VT-D IOMMU HW.
> + *
> + * The named levels in the spec map to the pts->level as:
> + *   Table/PTE - 0
> + *   Directory/PDE - 1
> + *   Directory Ptr/PDPTE - 2
> + *   PML4/PML4E - 3
> + *   PML5/PML5E - 4

Any particularly reason not to use x86's (and KVM's) effective 1-based system?
(level '0' is essentially the 4KiB leaf entries in a page table)

Starting at '1' is kinda odd, but it aligns with thing like PML4/5, allows using
the pg_level enums from x86, and diverging from both x86 MM and KVM is likely
going to confuse people.
	
> + * FIXME: __sme_set
> + */
> +#ifndef __GENERIC_PT_FMT_X86PAE_H
> +#define __GENERIC_PT_FMT_X86PAE_H
> +
> +#include "defs_x86pae.h"
> +#include "../pt_defs.h"
> +
> +#include <linux/bitfield.h>
> +#include <linux/container_of.h>
> +#include <linux/log2.h>
> +
> +enum {
> +	PT_MAX_OUTPUT_ADDRESS_LG2 = 52,
> +	PT_MAX_VA_ADDRESS_LG2 = 57,
> +	PT_ENTRY_WORD_SIZE = sizeof(u64),
> +	PT_MAX_TOP_LEVEL = 4,
> +	PT_GRANUAL_LG2SZ = 12,
> +	PT_TABLEMEM_LG2SZ = 12,
> +};
> +
> +/* Shared descriptor bits */
> +enum {
> +	X86PAE_FMT_P = BIT(0),
> +	X86PAE_FMT_RW = BIT(1),
> +	X86PAE_FMT_U = BIT(2),
> +	X86PAE_FMT_A = BIT(5),
> +	X86PAE_FMT_D = BIT(6),
> +	X86PAE_FMT_OA = GENMASK_ULL(51, 12),
> +	X86PAE_FMT_XD = BIT_ULL(63),

Any reason not to use the #defines in arch/x86/include/asm/pgtable_types.h?

> +static inline bool x86pae_pt_install_table(struct pt_state *pts,
> +					   pt_oaddr_t table_pa,
> +					   const struct pt_write_attrs *attrs)
> +{
> +	u64 *tablep = pt_cur_table(pts, u64);
> +	u64 entry;
> +
> +	/*
> +	 * FIXME according to the SDM D is ignored by HW on table pointers?

Correct, only leaf entries have dirty bits.  

> +	 * io_pgtable_v2 sets it
> +	 */
> +	entry = X86PAE_FMT_P | X86PAE_FMT_RW | X86PAE_FMT_U | X86PAE_FMT_A |

What happens with the USER bit for I/O page tables?  Ignored, I assume?

> +		X86PAE_FMT_D |
> +		FIELD_PREP(X86PAE_FMT_OA, log2_div(table_pa, PT_GRANUAL_LG2SZ));
> +	return pt_table_install64(&tablep[pts->index], entry, pts->entry);
> +}




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux