On Mon, May 07, 2007 at 10:13:57AM +0900, Atsushi SAKAI wrote: > Hi, Jan > > I think you should use 0.2.1 at this moment. > libvirt cannot handle Xen-hypervisor-domctl correctly on 0.2.2. > But Xen-hypervisor-sysctl works fine. > This problem recognized in two weeks ago, > but I have no time to investigate this issue. I've been trying to reproduce / diagnose the problems you reported too but not had much luck so far. Every way I look at it the code looks to be using the correct hypercall numbers, operation numbers & structs. Until I just noticed this: xenHypervisorDoV2Dom(int handle, xen_op_v2_dom* op) { .... if (mlock(op, sizeof(dom0_op_t)) < 0) { Notice that it is doing sizeof(dom0_op_t) instead of sizeof(xen_op_v2_dom) There is the same typo with xenHypervisorDoV2Sys. Now dom0_op_t is defined as struct dom0_op { uint32_t cmd; uint32_t interface_version; /* DOM0_INTERFACE_VERSION */ union { struct dom0_msr msr; struct dom0_settime settime; struct dom0_add_memtype add_memtype; struct dom0_del_memtype del_memtype; struct dom0_read_memtype read_memtype; struct dom0_microcode microcode; struct dom0_platform_quirk platform_quirk; struct dom0_memory_map_entry physical_memory_map; uint8_t pad[128]; } u; }; Which is 4 + 4 + 128 bytes == 136 Nexzt, xen_sysctl is defined as struct xen_sysctl { uint32_t cmd; uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; struct xen_sysctl_tbuf_op tbuf_op; struct xen_sysctl_physinfo physinfo; struct xen_sysctl_sched_id sched_id; struct xen_sysctl_perfc_op perfc_op; struct xen_sysctl_getdomaininfolist getdomaininfolist; uint8_t pad[128]; } u; }; Which is also 4 + 4 + 128 bytes == 136 Finally, xen_domctl is defined as struct xen_domctl { uint32_t cmd; uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */ domid_t domain; union { struct xen_domctl_createdomain createdomain; struct xen_domctl_getdomaininfo getdomaininfo; struct xen_domctl_getmemlist getmemlist; struct xen_domctl_getpageframeinfo getpageframeinfo; struct xen_domctl_getpageframeinfo2 getpageframeinfo2; struct xen_domctl_vcpuaffinity vcpuaffinity; struct xen_domctl_shadow_op shadow_op; struct xen_domctl_max_mem max_mem; struct xen_domctl_vcpucontext vcpucontext; struct xen_domctl_getvcpuinfo getvcpuinfo; struct xen_domctl_max_vcpus max_vcpus; struct xen_domctl_scheduler_op scheduler_op; struct xen_domctl_setdomainhandle setdomainhandle; struct xen_domctl_setdebugging setdebugging; struct xen_domctl_irq_permission irq_permission; struct xen_domctl_iomem_permission iomem_permission; struct xen_domctl_ioport_permission ioport_permission; struct xen_domctl_hypercall_init hypercall_init; struct xen_domctl_arch_setup arch_setup; struct xen_domctl_settimeoffset settimeoffset; uint8_t pad[128]; } u; }; Which is cruicially different 4 + 4 + 2 + 128 bytes == 138 So the buffer we're mlock()ing is 2 bytes too small for domctl hypercalls. This may or may not explan the bugs, but its a worthwhile bug fix to try if you have a system where you can reliably reproduce the vcpu problems. The second thing is that we've just discovered a bug in the Fedora Xen kernels 2.6.20 wrt to SMP which could cause random bad things to happen So if you're using a Fedora 2.6.20 kernel it is also worth seeing if it is still a problem with an older Fedora 2.6.19/18 kernel, or with the vanilla upstream Xen Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|