----- "Hu Tao" <hutao@xxxxxxxxxxxxxx> wrote: > > > On Tue, Oct 19, 2010 at 09:06:33AM -0400, Dave Anderson wrote: > > > > > > > > ----- "Hu Tao" <hutao cn fujitsu com> wrote: > > > > > > > > > Hi Dave, > > > > > > > > > > These are updated patches tested with SMP system and panic task. > > > > > > > > > > When testing a x86 guest, I found another bug about reading cpu > > > > > registers from dumpfile. Qemu simulated system is x86_64 > > > > > (qemu-system-x86_64), guest OS is x86. When crash reads cpu registers > > > > > from dumpfile, it uses cpu_load_32(), this will read gp registers by > > > > > get_be_long(fp, 32), that is, treate them as 32bits. But in fact, > > > > > qemu-system-x86_64 saves 64bits for each of them(although guest OS > > > > > uses only lower 32 bits). As a result, crash gets wrong cpu gp > > > > > register values. > > > > > > > > As I understand it, you're running a 32-bit guest on a 64-bit host. > > > > > > Yes. > > > > > > > If you were to read 64-bit register values instead of 32-bit register > > > > values, wouldn't that cause the file offsets of the subsequent get_xxx() > > > > calls in cpu_load() to read from the wrong file offsets? And then > > > > that would leave the ending file offset incorrect, such that the > > > > qemu_load() loop would fail to find the next device? > > > > > > > > In other words, the cpu_load() function, which is used for both > > > > 32-bit and 64-bit guests, must be reading the correct amount of > > > > data from the "cpu" device, or else qemu_load() would fail to > > > > find the next device in the next location in the dumpfile. > > > > > > True. In fact, in my case if read 32-bit registers, following devices > > > are found: > > > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu. > > > If read 64-bit registers, following devices are found: > > > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu, apic, fw_cfg > > > > Right -- so it got "lost" after incorrectly gathering the data for the > > first "cpu" device instance. > > > > > > > Is there any way we can know from dumpfile that these gp > > > > > registers(and those similar registers) are 32bits or 64bits? > > > > > > > > I don't know. If what you say is true, when would those registers > > > > ever be 32-bit values? > > > > > > I did tests on a 64-bit machine. Result is: > > > > > > machine OS guest machine guest OS saved gp regs > > > ------------------------------------------------------------------------ > > > 64-bit x86 qemu-kvm(kvm enabled) x86 64 bits > > > 64-bit x86 qemu(kvm disabled) x86 32 bits > > > > I don't understand what you mean when you say that the guest machine > > is "kvm enabled" or "kvm disabled"? > > Sorry for being vague. "kvm enabled" means using qemu-kvm to bring up > guest machine and this enables KVM hardware virtualization on host. > "kvm disabled" means using qemu to bring up guest machine and this > disables KVM hardware virtualization on host. > > > > > And if your host machine is running a 32-bit x86 OS (on 64-bit hardware), > > that's something I've never seen given that Red Hat only allows 64-bit > > kernels as KVM hosts. > > I did the test on Fedora 13 i686. Just tried rhel6 i386, as you said, > there is no kvm support. Hello Hu, Your supposition that the "cpu" device layout is dependent upon the host kernel type is correct, but unfortunately there's no readily-evident way to determine what type of kernel the host was running. This is Paolo's response to the question: > So the question is: > > Can it be determined from something in the dumpfile header that > the *host* machine was running a 32-bit kernel? It's not an exact science, but you can do some trial-and-error. I suggest measuring the distance from between the cpu and apic blocks (which you can do using code from your "workaround" explained below, I guess) and deciding based on the size of the CPU block. A 64-bit image I have lying around takes 987 bytes, I'd guess that anything above 850 is 64-bit. Maybe you can start searching after the first 250 bytes, since the registers are at the beginning and if you're going to get a false match you're going to get it there. The "workaround" he's referring to is this, which will be in the next release: Re: [patch] crash on a KVM-generated dump https://www.redhat.com/archives/crash-utility/2010-October/msg00034.html But it's not a particularly graceful solution in this case, because it would require walking through all of the "block" and "ram" devices to find the first "cpu" device -- but at that point the 32-vs-64 bit device has already been selected. I suppose another alternative would be to always start reading the "cpu" data in cpu_load() as if it were created by a 64-bit host, and making a determination somewhere along the way that the data being read is bogus and that it should be using the 32-bit device mechanism, seeking back, and calling the other function? I don't know -- either option would be be really ugly... Anyway, given that the use of 32-bit KVM hosts should be fairly rare, what would you think of handling it this way: (1) use the 64-bit functions by default (2) adding a crash command line option like "--kvmhost 32" to force the use of the 32-bit functions And of course, even if the new option were *not* used on a 32-bit dumpfile, it would still behave as it does now -- crash still comes up OK -- but it just wouldn't be able to use the registers from the header. What do you think? Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility