Hi Marc, On 05/06/2024 14:39, Marc Zyngier wrote: > The subject line is... odd. I'd expect something like: > > "irqchip/gic-v3-its: Share ITS tables with a non-trusted hypervisor" > > because nothing here should be CCA specific. Good point - that's a much better subject. > On Wed, 05 Jun 2024 10:30:04 +0100, > Steven Price <steven.price@xxxxxxx> wrote: >> >> Within a realm guest the ITS is emulated by the host. This means the >> allocations must have been made available to the host by a call to >> set_memory_decrypted(). Introduce an allocation function which performs >> this extra call. > > This doesn't mention that this patch radically changes the allocation > of some tables. I guess that depends on your definition of radical, see below. >> >> Co-developed-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx> >> Signed-off-by: Steven Price <steven.price@xxxxxxx> >> --- >> Changes since v2: >> * Drop 'shared' from the new its_xxx function names as they are used >> for non-realm guests too. >> * Don't handle the NUMA_NO_NODE case specially - alloc_pages_node() >> should do the right thing. >> * Drop a pointless (void *) cast. >> --- >> drivers/irqchip/irq-gic-v3-its.c | 90 ++++++++++++++++++++++++-------- >> 1 file changed, 67 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c >> index 40ebf1726393..ca72f830f4cc 100644 >> --- a/drivers/irqchip/irq-gic-v3-its.c >> +++ b/drivers/irqchip/irq-gic-v3-its.c >> @@ -18,6 +18,7 @@ >> #include <linux/irqdomain.h> >> #include <linux/list.h> >> #include <linux/log2.h> >> +#include <linux/mem_encrypt.h> >> #include <linux/memblock.h> >> #include <linux/mm.h> >> #include <linux/msi.h> >> @@ -27,6 +28,7 @@ >> #include <linux/of_pci.h> >> #include <linux/of_platform.h> >> #include <linux/percpu.h> >> +#include <linux/set_memory.h> >> #include <linux/slab.h> >> #include <linux/syscore_ops.h> >> >> @@ -163,6 +165,7 @@ struct its_device { >> struct its_node *its; >> struct event_lpi_map event_map; >> void *itt; >> + u32 itt_order; >> u32 nr_ites; >> u32 device_id; >> bool shared; >> @@ -198,6 +201,30 @@ static DEFINE_IDA(its_vpeid_ida); >> #define gic_data_rdist_rd_base() (gic_data_rdist()->rd_base) >> #define gic_data_rdist_vlpi_base() (gic_data_rdist_rd_base() + SZ_128K) >> >> +static struct page *its_alloc_pages_node(int node, gfp_t gfp, >> + unsigned int order) >> +{ >> + struct page *page; >> + >> + page = alloc_pages_node(node, gfp, order); >> + >> + if (page) >> + set_memory_decrypted((unsigned long)page_address(page), >> + 1 << order); > > Please use BIT(order). Sure. >> + return page; >> +} >> + >> +static struct page *its_alloc_pages(gfp_t gfp, unsigned int order) >> +{ >> + return its_alloc_pages_node(NUMA_NO_NODE, gfp, order); >> +} >> + >> +static void its_free_pages(void *addr, unsigned int order) >> +{ >> + set_memory_encrypted((unsigned long)addr, 1 << order); >> + free_pages((unsigned long)addr, order); >> +} >> + >> /* >> * Skip ITSs that have no vLPIs mapped, unless we're on GICv4.1, as we >> * always have vSGIs mapped. >> @@ -2212,7 +2239,8 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags) >> { >> struct page *prop_page; >> >> - prop_page = alloc_pages(gfp_flags, get_order(LPI_PROPBASE_SZ)); >> + prop_page = its_alloc_pages(gfp_flags, >> + get_order(LPI_PROPBASE_SZ)); >> if (!prop_page) >> return NULL; >> >> @@ -2223,8 +2251,8 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags) >> >> static void its_free_prop_table(struct page *prop_page) >> { >> - free_pages((unsigned long)page_address(prop_page), >> - get_order(LPI_PROPBASE_SZ)); >> + its_free_pages(page_address(prop_page), >> + get_order(LPI_PROPBASE_SZ)); >> } >> >> static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size) >> @@ -2346,7 +2374,8 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser, >> order = get_order(GITS_BASER_PAGES_MAX * psz); >> } >> >> - page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, order); >> + page = its_alloc_pages_node(its->numa_node, >> + GFP_KERNEL | __GFP_ZERO, order); >> if (!page) >> return -ENOMEM; >> >> @@ -2359,7 +2388,7 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser, >> /* 52bit PA is supported only when PageSize=64K */ >> if (psz != SZ_64K) { >> pr_err("ITS: no 52bit PA support when psz=%d\n", psz); >> - free_pages((unsigned long)base, order); >> + its_free_pages(base, order); >> return -ENXIO; >> } >> >> @@ -2415,7 +2444,7 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser, >> pr_err("ITS@%pa: %s doesn't stick: %llx %llx\n", >> &its->phys_base, its_base_type_string[type], >> val, tmp); >> - free_pages((unsigned long)base, order); >> + its_free_pages(base, order); >> return -ENXIO; >> } >> >> @@ -2554,8 +2583,8 @@ static void its_free_tables(struct its_node *its) >> >> for (i = 0; i < GITS_BASER_NR_REGS; i++) { >> if (its->tables[i].base) { >> - free_pages((unsigned long)its->tables[i].base, >> - its->tables[i].order); >> + its_free_pages(its->tables[i].base, >> + its->tables[i].order); >> its->tables[i].base = NULL; >> } >> } >> @@ -2821,7 +2850,8 @@ static bool allocate_vpe_l2_table(int cpu, u32 id) >> >> /* Allocate memory for 2nd level table */ >> if (!table[idx]) { >> - page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(psz)); >> + page = its_alloc_pages(GFP_KERNEL | __GFP_ZERO, >> + get_order(psz)); >> if (!page) >> return false; >> >> @@ -2940,7 +2970,8 @@ static int allocate_vpe_l1_table(void) >> >> pr_debug("np = %d, npg = %lld, psz = %d, epp = %d, esz = %d\n", >> np, npg, psz, epp, esz); >> - page = alloc_pages(GFP_ATOMIC | __GFP_ZERO, get_order(np * PAGE_SIZE)); >> + page = its_alloc_pages(GFP_ATOMIC | __GFP_ZERO, >> + get_order(np * PAGE_SIZE)); >> if (!page) >> return -ENOMEM; >> >> @@ -2986,8 +3017,8 @@ static struct page *its_allocate_pending_table(gfp_t gfp_flags) >> { >> struct page *pend_page; >> >> - pend_page = alloc_pages(gfp_flags | __GFP_ZERO, >> - get_order(LPI_PENDBASE_SZ)); >> + pend_page = its_alloc_pages(gfp_flags | __GFP_ZERO, >> + get_order(LPI_PENDBASE_SZ)); >> if (!pend_page) >> return NULL; >> >> @@ -2999,7 +3030,7 @@ static struct page *its_allocate_pending_table(gfp_t gfp_flags) >> >> static void its_free_pending_table(struct page *pt) >> { >> - free_pages((unsigned long)page_address(pt), get_order(LPI_PENDBASE_SZ)); >> + its_free_pages(page_address(pt), get_order(LPI_PENDBASE_SZ)); >> } >> >> /* >> @@ -3334,8 +3365,9 @@ static bool its_alloc_table_entry(struct its_node *its, >> >> /* Allocate memory for 2nd level table */ >> if (!table[idx]) { >> - page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, >> - get_order(baser->psz)); >> + page = its_alloc_pages_node(its->numa_node, >> + GFP_KERNEL | __GFP_ZERO, >> + get_order(baser->psz)); >> if (!page) >> return false; >> >> @@ -3418,7 +3450,9 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, >> unsigned long *lpi_map = NULL; >> unsigned long flags; >> u16 *col_map = NULL; >> + struct page *page; >> void *itt; >> + int itt_order; >> int lpi_base; >> int nr_lpis; >> int nr_ites; >> @@ -3430,7 +3464,6 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, >> if (WARN_ON(!is_power_of_2(nvecs))) >> nvecs = roundup_pow_of_two(nvecs); >> >> - dev = kzalloc(sizeof(*dev), GFP_KERNEL); >> /* >> * Even if the device wants a single LPI, the ITT must be >> * sized as a power of two (and you need at least one bit...). >> @@ -3438,7 +3471,16 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, >> nr_ites = max(2, nvecs); >> sz = nr_ites * (FIELD_GET(GITS_TYPER_ITT_ENTRY_SIZE, its->typer) + 1); >> sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; >> - itt = kzalloc_node(sz, GFP_KERNEL, its->numa_node); >> + itt_order = get_order(sz); >> + page = its_alloc_pages_node(its->numa_node, >> + GFP_KERNEL | __GFP_ZERO, >> + itt_order); > > So we go from an allocation that was so far measured in *bytes* to > something that is now at least a page. Per device. This seems a bit > excessive to me, specially when it isn't conditioned on anything and > is now imposed on all platforms, including the non-CCA systems (which > are exactly 100% of the machines). Catalin asked about this in v2: https://lore.kernel.org/lkml/c329ae18-2b61-4851-8d6a-9e691a2007c8@xxxxxxx/ To be honest, I don't have a great handle on how much memory is being wasted here. Within the realm guest I was testing this is rounding up an otherwise 511 byte allocation to a 4k page, and there are 3 of them. Which seems reasonable from a realm guest perspective. I can see two options to improve here: 1. Add a !is_realm_world() check and return to the previous behaviour when not running in a realm. It's ugly, and doesn't deal with any other potential future memory encryption. cc_platform_has(CC_ATTR_MEM_ENCRYPT) might be preferable? But this means no impact to non-realm guests. 2. Use a special (global) memory allocator that does the set_memory_decrypted() dance on the pages that it allocates but allows packing the allocations. I'm not aware of an existing kernel API for this, so it's potentially quite a bit of code. The benefit is that it reduces memory consumption in a realm guest, although fragmentation still means we're likely to see a (small) growth. Any thoughts on what you think would be best? > Another thing is that if we go with page alignment, then the 256 byte > alignment can obviously be removed everywhere (hint: MAPD needs to > change). Ah, good point - I'll need to look into that, my GIC-foo isn't quite up to speed on that. Thanks, Steve