Re: [Qemu-devel] [PATCH 0/2] Fix wide ioport access cracking

Avi Kivity <avi@xxxxxxxxxx> · Thu, 11 Aug 2011 12:44:34 +0300

On 08/11/2011 12:01 PM, Gerhard Wiesinger wrote:
Hello Avi,

#0  0x0000003a060328f5 in raise () from /lib64/libc.so.6
#1  0x0000003a060340d5 in abort () from /lib64/libc.so.6
#2  0x0000003a0602b8b5 in __assert_fail () from /lib64/libc.so.6
#3  0x0000000000435339 in memory_region_del_subregion (mr=<value 
optimized out>, subregion=<value optimized out>)    at 
/root/download/qemu/git/qemu-kvm-test/memory.c:1168
#4  0x000000000041eb9b in pci_update_mappings (d=0x1a90bc0) at 
/root/download/qemu/git/qemu-kvm-test/hw/pci.c:1134
#5  0x0000000000420a9c in pci_default_write_config (d=0x1a90bc0, 
addr=4, val=<value optimized out>, l=<value optimized out>)     at 
/root/download/qemu/git/qemu-kvm-test/hw/pci.c:1213
#6  0x00000000004329a6 in kvm_handle_io (env=0x1931af0) at 
/root/download/qemu/git/qemu-kvm-test/kvm-all.c:858
#7  kvm_cpu_exec (env=0x1931af0) at 
/root/download/qemu/git/qemu-kvm-test/kvm-all.c:997
#8  0x000000000040bd4a in qemu_kvm_cpu_thread_fn (arg=0x1931af0) at 
/root/download/qemu/git/qemu-kvm-test/cpus.c:806
#9  0x0000003a06807761 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003a060e098d in clone () from /lib64/libc.so.6

In frame 4, can you print out i, *r, and d->io_regions[0 through 6]?  
Some of them may be optimized out unfortunately.

BTW: Is the new memory API faster? Any ideas how to optimize (if not)?

Currently it has no effect on run time performance.

I don't know if you remember but I was looking for fast access for the 
following use cases for DOS legacy for KVM:
1.) Page switching in the area of 0xA0000-0xAFFFF (linear frame buffer 
mapping) through INT 0x10 function
2.) Access the memory page

As far as I saw there are 2 different virtualization approaches 
(different in VMWare VGA and cirrus VGA):
1.) Just remember the page on the INT 0x10 function setter and 
virtualize each access to the page.
Advantages: Fast page switching
Disadvantages: Each access is virtualized which is slow (you pointed 
out that each switch from non virtualized to virtualized is very slow 
and requires thousands of CPU cycles, see archive)

2.) mapping in the INT 0x10 function through memory mapping functions 
and direct access to the mapped memory area without virtualization.
Advantages: Fast direct access
Disadvantages with old API: was very slow (was about 1000 switches per 
second or even lower as far as I remember)
As far as I found it out it came from (maybe a linear list issue?):
static int cpu_notify_sync_dirty_bitmap(target_phys_addr_t start,
                                        target_phys_addr_t end)
{
    CPUPhysMemoryClient *client;
    QLIST_FOREACH(client, &memory_client_list, list) {
        int r = client->sync_dirty_bitmap(client, start, end);
        if (r < 0)
            return r;
    }
    return 0;
}

kvm_physical_sync_dirty_bitmap

I think variant 2 is the preferred one but with optimized switching of 
mapping.

This should be faster today with really new kernels (the problem is not 
in qemu) but I'm not sure if it's fast enough.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html