Re: crash and libvirt, and more

Dave Anderson <anderson@xxxxxxxxxx> · Mon, 18 Aug 2008 16:01:41 -0400

Richard W.M. Jones wrote:
Hi:

I don't know if you're aware of this, but libvirt[1] recently added a
call which allows you to snoop on the live memory of guests,
virDomainMemoryPeek[2]:

int virDomainMemoryPeek (virDomainPtr dom,
    			 unsigned long long start,  /* start address */
			 size_t size,               /* size (bytes) */
			 void * buffer,		    /* return buffer */
			 unsigned int flags);

This would allow, in theory, for crash to debug running guests.  I had
a look at the crash code and it doesn't seem like it would be too hard
to add this.

We [the libvirt team] only support this for QEMU & KVM guests at the
moment, but we plan to support this call for Xen in the near future.
Also, the call only works on virtual memory addresses (in other words,
the address is translated through the guest's page tables), but in
practice that isn't too bad because the common configuration for Linux
is to map all of physical memory at some address, eg. 0xc0000000 on
i386.  Also the peek operation is read-only.

So if you are interested, let me know, and I will attempt a patch.

That is very interesting, i.e., as opposed to just logging into the guest and
running crash live there.  Though, there are a number of places where the
crash readmem() function is passed a physical address, and I wonder whether
it's going to be hampered by that?  Say for translating and reading vmalloc/
module addresses, user task virtual addresses, etc.  Can the "start" address
above be a vmalloc address?  In those cases, things might get a bit more
involved -- as opposed to the simple "readmem(KVADDR, ...)" case where if it's
a unity-mapped address it could jump to libvirt function above without having
to turn it into a physical address.  In other words, all physical address
readmem() requests are not necessarily the result of a kernel virtual address
reference that's been pre-translated.  And if the pseudo-physical address is
beyond the 32-bit unity-mapping limit, you couldn't turn it back into a
unity-mapped kernel virtual address, so I don't know how you could access it?

I may be missing something, so by all means don't let me stop you from trying
though...  ;-)

	-	-	-

Now, the bigger picture ...

For some months now we've been attempting to write system
administrator tools to mimic common sysadmin commands, except that
they work on guests.  For example 'virt-ps <guest>' lists out the
process table in <guest>.  It runs from the host and works by snooping
guest memory using virDomainMemoryPeek.

We have had some success, although it's been quite a lot harder than
we imagined it would be.  At the moment we have 'virt-dmesg',
'virt-uname', 'virt-ifconfig' and 'virt-ps', plus a handful of custom
commands, working to a greater or lesser extent.

However I wasn't aware before that crash could already do this
(particularly 'log', 'ps', 'mount' and 'net' commands), and in fact
crash has a lot more complete support for these commands than we do.
So it makes sense to use crash to do this, instead of continuing with
our separate implementation, if we can make it work.

I think there are two things that we'd need to add to crash in order
to get this working:

(i) Scripting.  I'm aware that there are two scripting projects for
crash out there already, but it looked fairly immature and/or
unsupported.  However, not too hard to pull these projects up to
standard and/or add some scripting support, or use expect.

(ii) Getting the debug symbols.

Item (ii) is the big deal for us.  Our current virt-* tools can work
with a wide range of kernels.

What we do is to download the kernel-debuginfo packages beforehand,
extract only the tiny amount of debug info we actually need from
vmlinux, and build a 'kernel database'.  (We're using dwarves to get
the layout of the dozen or so structures that we care about).  It
turns out that it's quite easy to heuristically determine the version
of a running kernel, and from that we can look up the structures in
the kernel database at runtime.

Upshot is that we support currently ~ 350 kernels with a database
which is a modest 1 MB in size, and probably could be made smaller
with very little effort.

The problem I haven't yet resolved with using crash is that we need a
matching, identical vmlinux image (ie. 50-100 MB) per guest kernel
version.  In the case where we see a kernel version we've not seen
before, we may have to download this and store it somewhere.

The alternative seems to involve some really deep hacking inside gdb,
perhaps so it can be persuaded to use only partial debug info?

I don't know if you have any thoughts about (ii).

Other than "good luck", I don't have any thoughts about that one...

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility