Hi Andre, On 2/17/21 5:43 PM, Andre Przywara wrote: > On Thu, 11 Feb 2021 16:10:16 +0000 > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > >> On 12/10/20 2:28 PM, Andre Przywara wrote: >>> In their core functionality MMIO and I/O port traps are not really >>> different, yet we still have two totally separate code paths for >>> handling them. Devices need to decide on one conduit or need to provide >>> different handler functions for each of them. >>> >>> Extend the existing MMIO emulation to also cover ioport handlers. >>> This just adds another RB tree root for holding the I/O port handlers, >>> but otherwise uses the same tree population and lookup code. >> Maybe I'm missing something, but why two trees? Is it valid to have an overlap >> between IO port and MMIO emulation? Or was it done to make the removal of ioport >> emulation easier? > So I thought about it as well, but figured it's easier this way: > - x86 allows overlap, PIO is a totally separate address space from > memory/MMIO. Early x86 CPUs had pins to indicate a PIO bus cycle, but > using the same address and data pins otherwise. In practise there > might be no overlap when it comes to *MMIO* traps vs PIO on x86 > (there is DRAM only at the lowest 64K of the IBM PC memory map), > but not sure we should rely on this. > - For non-x86 this would indeed be non-overlapping, but this would need > to be translated at init time then? And then we can't move those > anymore, I guess? So I found it cleaner to keep this separate, and do > the translation at trap time. > - As a consequence we would need to have a bit indicating the address > space. I haven't actually tried this, but my understanding is that > this would spoil the whole rb_tree functions, since they rely on a > linear addressing scheme, and adding another bit there would be at > least cumbersome? > > At the end I decided to go for separate trees, as also this was less > change. > > I agree that it would be nice to have one tree, from a design point of > view, but I am afraid that would require more changes. > If need be, I think we can always unify them later on, on top of this > series? Definitely later, I forgot that x86 uses special instructions to access IO ports, which means that port addresses can overlap with memory addresses. Let's keep 2 trees for now and we can decide later if we should unify them for the other architectures. > >> If it's not valid to have that overlap, then I think having one tree for both >> would better. Struct mmio_mapping would have to be augmented with a flags field >> that holds the same flags given to kvm__register_iotrap to differentiate between >> the two slightly different emulations. Saving the IOTRAP_COALESCE flag would also >> make it trivial to call KVM_UNREGISTER_COALESCED_MMIO in kvm__deregister_iotrap, >> which we currently don't do. >> >>> "ioport" or "mmio" just become a flag in the registration function. >>> Provide wrappers to not break existing users, and allow an easy >>> transition for the existing ioport handlers. >>> >>> This also means that ioport handlers now can use the same emulation >>> callback prototype as MMIO handlers, which means we have to migrate them >>> over. To allow a smooth transition, we hook up the new I/O emulate >>> function to the end of the existing ioport emulation code. >> I'm sorry, but I don't understand that last sentence. Do you mean that the ioport >> emulation code has been modified to use kvm__emulate_pio() as a fallback for when >> the port is not found in the ioport_tree? > I meant that for the transition period we have all of traditional MMIO, > traditional PIO, *and* just transformed PIO. > > That means there are still PIO devices registered through ioport.c's > ioport__register(), *and* PIO devices registered through mmio.c's > kvm__register_pio(). Which means they end up in two separate PIO trees. > And only the traditional kvm__emulate_io() from ioport.c is called upon > a trap, so it needs to check both trees, which it does by calling into > kvm__emulate_pio(), shall a search in the local tree fail. Thank you for the explanation, now it makes sense. > > Or did you mean something else? > >>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx> >>> --- >>> include/kvm/kvm.h | 42 +++++++++++++++++++++++++++++---- >>> ioport.c | 4 ++-- >>> mmio.c | 59 +++++++++++++++++++++++++++++++++++++++-------- >>> 3 files changed, 89 insertions(+), 16 deletions(-) >>> >>> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h >>> index ee99c28e..14f9d58b 100644 >>> --- a/include/kvm/kvm.h >>> +++ b/include/kvm/kvm.h >>> @@ -27,10 +27,16 @@ >>> #define PAGE_SIZE (sysconf(_SC_PAGE_SIZE)) >>> #endif >>> >>> +#define IOTRAP_BUS_MASK 0xf >> It's not immediately obvious what this mask does. It turns out it's used to mask >> the enum flags defined in the header devices.h, header which is not included in >> this file. >> >> The flag names we pass to kvm__register_iotrap() are slightly inconsistent >> (DEVICE_BUS_PCI, DEVICE_BUS_MMIO and IOTRAP_COALESCE), where DEVICE_BUS_{PCI, >> MMIO} come from devices.h as an enum. I was wondering if I'm missing something and >> there is a particular reason why we don't define our own flags for that here >> (something like IOTRAP_PIO and IOTRAP_MMIO). > I am not sure why this would be needed? > We already define and use DEVICE_BUS_x elsewhere, so why not re-use it? To check if we should coalesce the MMIO regions we check the IOTRAP_COALESCE bit, but to check if we should use the mmio or io tree we use a mask over the first 4 bits and compare that to DEVICE_BUS_xxx. I find that confusing: the IOTRAP_BUS_MASK and IOTRAP_COALESCE are defined here, but there is no evidence where the other flags are coming from, and the header file devices.h isn't even included. > >> If we do decide to keep the flags from devices.h, I think it would be worth it to >> have a compile time check (with BUILD_BUG_ON) that IOTRAP_BUS_MASK is >= >> DEVICES_BUS_MAX, which would also be a good indication of where those flags are >> coming from. > Well, if that makes you happy, I am not sure we gain another 13 bus > types anytime soon, though ;-) I was actually suggesting that more for documenting the code. If we compare IOTRAP_BUS_MASK with DEVICE_BUS_MAX, then that will give us a hint where we're expecting the flags to be defined. But I had another look at the code and it seems that DEVICE_BUS_MMIO = 1 and DEVICE_BUS_IOPORT = 2, which means that we don't need the mask and we can check the bits instead (and IOTRAP_COALESCE can be redefined to be 1U << 2). I think dropping the mask and replacing it with testing individual bits would make the code easier to follow, what do you think? > >>> +#define IOTRAP_COALESCE (1U << 4) >>> + >>> #define DEFINE_KVM_EXT(ext) \ >>> .name = #ext, \ >>> .code = ext >>> >>> +struct kvm_cpu; >>> +typedef void (*mmio_handler_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, >>> + u32 len, u8 is_write, void *ptr); >>> typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type); >>> >>> enum { >>> @@ -113,6 +119,8 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level); >>> void kvm__irq_trigger(struct kvm *kvm, int irq); >>> bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count); >>> bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write); >>> +bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data, >>> + int direction, int size, u32 count); >>> int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr); >>> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr, >>> enum kvm_mem_type type); >>> @@ -136,10 +144,36 @@ static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size) >>> KVM_MEM_TYPE_RESERVED); >>> } >>> >>> -int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce, >>> - void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), >>> - void *ptr); >>> -bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr); >>> +int __must_check kvm__register_iotrap(struct kvm *kvm, u64 phys_addr, u64 len, >>> + mmio_handler_fn mmio_fn, void *ptr, >>> + unsigned int flags); >>> + >>> +static inline >>> +int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, >>> + u64 phys_addr_len, bool coalesce, >>> + mmio_handler_fn mmio_fn, void *ptr) >>> +{ >>> + return kvm__register_iotrap(kvm, phys_addr, phys_addr_len, mmio_fn, ptr, >>> + DEVICE_BUS_MMIO | (coalesce ? IOTRAP_COALESCE : 0)); >>> +} >>> +static inline >>> +int __must_check kvm__register_pio(struct kvm *kvm, u16 port, u16 len, >>> + mmio_handler_fn mmio_fn, void *ptr) >>> +{ >>> + return kvm__register_iotrap(kvm, port, len, mmio_fn, ptr, >>> + DEVICE_BUS_IOPORT); >>> +} >>> + >>> +bool kvm__deregister_iotrap(struct kvm *kvm, u64 phys_addr, unsigned int flags); >>> +static inline bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) >>> +{ >>> + return kvm__deregister_iotrap(kvm, phys_addr, DEVICE_BUS_MMIO); >>> +} >>> +static inline bool kvm__deregister_pio(struct kvm *kvm, u16 port) >>> +{ >>> + return kvm__deregister_iotrap(kvm, port, DEVICE_BUS_IOPORT); >>> +} >>> + >>> void kvm__reboot(struct kvm *kvm); >>> void kvm__pause(struct kvm *kvm); >>> void kvm__continue(struct kvm *kvm); >>> diff --git a/ioport.c b/ioport.c >>> index b98836d3..204d8103 100644 >>> --- a/ioport.c >>> +++ b/ioport.c >>> @@ -147,7 +147,8 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, >>> >>> entry = ioport_get(&ioport_tree, port); >>> if (!entry) >>> - goto out; >>> + return kvm__emulate_pio(vcpu, port, data, direction, >>> + size, count); >> I have to admit this gave me pause because this patch doesn't add any users for >> kvm__register_pio() (although with this change the behaviour of kvm__emulate_io() >> remains exactly the same). Do you think this change would fit better in patch #7, >> where the first user for kvm__register_pio() is added, or do you prefer it here? > I think it logically belongs here, as we introduce the > kvm__emulate_pio() function here as well. Otherwise this function would > have no caller. As it is now, it's just a "coincidence" that no one > actually called kvm__register_pio() so far. This also makes the other > patches movable and replaceable: this patch prepares the stage, the > follow-up patches just fill it. Sure, makes sense. > >>> >>> ops = entry->ops; >>> >>> @@ -162,7 +163,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 >>> port, void *data, int direction, >>> ioport_put(&ioport_tree, entry); >>> >>> -out: >>> if (ret) >>> return true; >>> >>> diff --git a/mmio.c b/mmio.c >>> index cd141cd3..4cce1901 100644 >>> --- a/mmio.c >>> +++ b/mmio.c >>> @@ -19,13 +19,14 @@ static DEFINE_MUTEX(mmio_lock); >>> >>> struct mmio_mapping { >>> struct rb_int_node node; >>> - void (*mmio_fn)(struct kvm_cpu >>> *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr); >>> + mmio_handler_fn mmio_fn; >>> void *ptr; >>> u32 refcount; >>> bool remove; >>> }; >>> >>> static struct rb_root mmio_tree = RB_ROOT; >>> +static struct rb_root pio_tree = RB_ROOT; >>> >>> static struct mmio_mapping *mmio_search(struct rb_root *root, u64 >>> addr, u64 len) { >>> @@ -103,9 +104,9 @@ static void mmio_put(struct kvm *kvm, struct >>> rb_root *root, struct mmio_mapping mutex_unlock(&mmio_lock); >>> } >>> >>> -int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 >>> phys_addr_len, bool coalesce, >>> - void (*mmio_fn)(struct kvm_cpu *vcpu, u64 >>> addr, u8 *data, u32 len, u8 is_write, void *ptr), >>> - void *ptr) >>> +int kvm__register_iotrap(struct kvm *kvm, u64 phys_addr, u64 >>> phys_addr_len, >>> + mmio_handler_fn mmio_fn, void *ptr, >>> + unsigned int flags) >>> { >>> struct mmio_mapping *mmio; >>> struct kvm_coalesced_mmio_zone zone; >>> @@ -127,7 +128,7 @@ int kvm__register_mmio(struct kvm *kvm, u64 >>> phys_addr, u64 phys_addr_len, bool c .remove = false, >>> }; >>> >>> - if (coalesce) { >>> + if (flags & IOTRAP_COALESCE) { >> There is no such flag being used in ioport.c, is it valid to have the >> flags DEVICE_BUS_IOPORT and IOTRAP_COALESCE set at the same time? > Well, yes and no: Yes, as this maps to MMIO on non-x86, so > theoretically could use the flag. No, as no one registering a trap > handler through kvm__register_pio() would ever have the chance to set > this flag. > I can check for the registration being for the MMIO bus before entering > the "if" branch, if that is what you mean? Yes, that's what I mean, please return an error if a device tries to register an I/O port and sets the COALESCE flag, since that's forbidden for architectures with true I/O ports (like x86). I realize it might look peculiar since we don't have any devices that do that, but I think it can be useful for catching errors when writing new devices. Thanks, Alex > >>> zone = (struct kvm_coalesced_mmio_zone) { >>> .addr = phys_addr, >>> .size = phys_addr_len, >>> @@ -139,18 +140,27 @@ int kvm__register_mmio(struct kvm *kvm, u64 >>> phys_addr, u64 phys_addr_len, bool c } >>> } >>> mutex_lock(&mmio_lock); >>> - ret = mmio_insert(&mmio_tree, mmio); >>> + if ((flags & IOTRAP_BUS_MASK) == DEVICE_BUS_IOPORT) >>> + ret = mmio_insert(&pio_tree, mmio); >>> + else >>> + ret = mmio_insert(&mmio_tree, mmio); >>> mutex_unlock(&mmio_lock); >>> >>> return ret; >>> } >>> >>> -bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) >>> +bool kvm__deregister_iotrap(struct kvm *kvm, u64 phys_addr, >>> unsigned int flags) { >>> struct mmio_mapping *mmio; >>> + struct rb_root *tree; >>> + >>> + if ((flags & IOTRAP_BUS_MASK) == DEVICE_BUS_IOPORT) >>> + tree = &pio_tree; >>> + else >>> + tree = &mmio_tree; >>> >>> mutex_lock(&mmio_lock); >>> - mmio = mmio_search_single(&mmio_tree, phys_addr); >>> + mmio = mmio_search_single(tree, phys_addr); >>> if (mmio == NULL) { >>> mutex_unlock(&mmio_lock); >>> return false; >>> @@ -167,7 +177,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 >>> phys_addr) >>> * called mmio_put(). This will trigger use-after-free >>> errors on VCPU0. */ >>> if (mmio->refcount == 0) >>> - mmio_deregister(kvm, &mmio_tree, mmio); >>> + mmio_deregister(kvm, tree, mmio); >>> else >>> mmio->remove = true; >>> mutex_unlock(&mmio_lock); >>> @@ -175,7 +185,8 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 >>> phys_addr) return true; >>> } >>> >>> -bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 >>> *data, u32 len, u8 is_write) +bool kvm__emulate_mmio(struct kvm_cpu >>> *vcpu, u64 phys_addr, u8 *data, >>> + u32 len, u8 is_write) >> I don't think style changes should be part of this patch, the patch >> is large enough as it is. > I see, I just figured it's not worth a separate patch either. > > Cheers, > Andre > >>> { >>> struct mmio_mapping *mmio; >>> >>> @@ -194,3 +205,31 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, >>> u64 phys_addr, u8 *data, u32 len, u out: >>> return true; >>> } >>> + >>> +bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data, >>> + int direction, int size, u32 count) >>> +{ >>> + struct mmio_mapping *mmio; >>> + bool is_write = direction == KVM_EXIT_IO_OUT; >>> + >>> + mmio = mmio_get(&pio_tree, port, size); >>> + if (!mmio) { >>> + if (vcpu->kvm->cfg.ioport_debug) { >>> + fprintf(stderr, "IO error: %s port=%x, >>> size=%d, count=%u\n", >>> + to_direction(direction), port, >>> size, count); + >>> + return false; >>> + } >>> + return true; >>> + } >>> + >>> + while (count--) { >>> + mmio->mmio_fn(vcpu, port, data, size, is_write, >>> mmio->ptr); + >>> + data += size; >>> + } >>> + >>> + mmio_put(vcpu->kvm, &pio_tree, mmio); >>> + >>> + return true; >>> +}