Problem with X on 32 bit guest on 64-bit host

Chris Lalancette <clalance@xxxxxxxxxx> · Thu, 05 Feb 2009 15:30:48 +0100

All,
     I've been trying to track down this problem with starting X on a 32-bit
guest on a 64-bit host, and I've hit a bit of a wall.  Let me describe the setup:

Host: AMD Barcelona machine, 16GB memory, 8 cores, running 2.6.29-rc2,
kvm-userspace 3f7cba35281a5b2dba008179a4979d737105574d

Guest: RHEL-5 32-bit guest, single VCPU.

The problem is that inside the 32-bit guest, X refuses to start.  Now,
on an Intel platform I have hanging around here, this works just fine; I copy
the guest over, start it up, and X starts right up.  Also, on the Barcelona,
with a 64-bit RHEL-5 guest, X starts fine.

I've done quite a bit of tracing inside the guest, and from the guest's
perspective, something just isn't right.  When X is trying to start, one thing
it does is copy a BIOS region from /dev/mem into a shared memory region mapped
at 0 inside the X process.  The page fault for the access
to the memory region at 0 works just fine, but the very next page fault that is
injected is completely bogus; it's either > TASK_SIZE (which is 0xc0000000), or
has bogus VMA flags set, etc.

Going further, what actually happens is that X uses glibc's optimized memcpy
routine, which, in assembly, looks like this:

(gdb) disass memcpy
Dump of assembler code for function memcpy:
0x00387090 <memcpy+0>:	mov    0xc(%esp),%ecx
0x00387094 <memcpy+4>:	mov    %edi,%eax
0x00387096 <memcpy+6>:	mov    0x4(%esp),%edi
0x0038709a <memcpy+10>:	mov    %esi,%edx
0x0038709c <memcpy+12>:	mov    0x8(%esp),%esi
0x003870a0 <memcpy+16>:	cld
0x003870a1 <memcpy+17>:	shr    %ecx
0x003870a3 <memcpy+19>:	jae    0x3870a6 <memcpy+22>
0x003870a5 <memcpy+21>:	movsb  %ds:(%esi),%es:(%edi)
0x003870a6 <memcpy+22>:	shr    %ecx
0x003870a8 <memcpy+24>:	jae    0x3870ac <memcpy+28>
0x003870aa <memcpy+26>:	movsw  %ds:(%esi),%es:(%edi)
0x003870ac <memcpy+28>:	rep movsl %ds:(%esi),%es:(%edi)
0x003870ae <memcpy+30>:	mov    %eax,%edi
0x003870b0 <memcpy+32>:	mov    %edx,%esi
0x003870b2 <memcpy+34>:	mov    0x4(%esp),%eax
0x003870b6 <memcpy+38>:	ret

If I replace that optimized memcpy routine with my own, stupid memcpy (basically
just dst[i] = src[i] in a loop), everything works fine, and doesn't get the
bogus page fault.  In turn, that leads me to suspect that the rep command is
actually not being emulated properly on the host side, but I'm not quite sure of
that, nor am I sure where to go from here.  Does anybody have any ideas of what
I can do to further track this down?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html