On Fri, Sep 11, 2020 at 11:15:04AM +0100, Will Deacon wrote: > On Thu, Sep 10, 2020 at 03:21:59PM +0100, Andrew Scull wrote: > > On Thu, Sep 10, 2020 at 01:37:13PM +0100, Will Deacon wrote: > > > On Wed, Sep 09, 2020 at 04:29:26PM +0100, Alexandru Elisei wrote: > > > > On 9/7/20 4:23 PM, Will Deacon wrote: > > > > > [..] > > > > > + > > > > > +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size, > > > > > + struct kvm_pgtable_walker *walker) > > > > > +{ > > > > > + struct kvm_pgtable_walk_data walk_data = { > > > > > + .pgt = pgt, > > > > > + .addr = ALIGN_DOWN(addr, PAGE_SIZE), > > > > > + .end = PAGE_ALIGN(walk_data.addr + size), > > > > > + .walker = walker, > > > > > + }; > > > > > > > > If the caller wants to walk [0x500, 0x1500), for PAGE_SIZE = 0x1000 (4K), the > > > > function walks the range [0x0, 0x1000). Is that intentional? > > > > > > Yes, although the caller specifies a base and a *size*, rather than an end > > > address. As a concrete example, much of the hypervisor stage-1 mapping > > > is created using PAGE_SIZE mappings of random ELF symbols, which correspond > > > to arbitrary addresses. In these cases, we really do want to round-down the > > > address and perform a PAGE_SIZE mapping. > > > > I think Alexandru has a point here. Turning his example into something > > equivalent that maps a random ELF symbol: > > > > struct some_hyp_state s = { ... }; > > // &s == 0x500 > > // sizeof(s) == PAGE_SIZE > > kvm_pgtable_walk(&s, sizeof(s), walker); > > > > Given `s` straddles the two pages, the part in the second page won't be > > mapped. > > > > Should the end address instead be calculated as: > > > > .end = PAGE_ALIGN(addr + size), > > Cheers for the example, and I see what you mean about structures that > straddle a page boundary. However, I think it's important here that the size > parameter accurately reflects the number of pages mapped: if the caller > passes PAGE_SIZE, we better not map more than a page, since the mmu cache > might not have enough pre-allocated pages for that. > > So the API is just that the virtual address bits corresponding to the offset > within a page are ignored. Looking back at the code, that works out for the > hyp mappings (it's actually the _physical_ address that is unaligned there, > and we can just round that down), but I think I have a potential bug in the > ioremap code if you place the GIC (v2) somewhere funky on a machine using > 64k pages. In this case, the ioctl() handler only enforces 4k alignment, > and so we could end up truncating the mapping in a similar case to what you > describe above. For example, trying to place it from 60k - 68k would result > in only the first page getting mapped. > > I've fixed that in the ioremap code (diff below), and I'll update the > kerneldoc to say that the bottom bits of the VA are ignored. You've described kvm_pgtable_walk as taking a start page and page count so an alternative to the comment could be defining function to reflect that e.g. int kvm_pgtable_walk(struct kvm_pgtable *pgt, gfn_t start, size_t count, kvm_pgtable_walker *walker); Up to you. > > Cheers, > > Will > > --->8 > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 1041be1fafe4..21b70abf65a7 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -505,6 +505,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, > KVM_PGTABLE_PROT_R | > (writable ? KVM_PGTABLE_PROT_W : 0); > > + size += offset_in_page(guest_ipa); > + guest_ipa &= PAGE_MASK; > + > for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) { > ret = kvm_mmu_topup_memory_cache(&cache, > kvm_mmu_cache_min_pages(kvm)); > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm