Re: [PATCH 0/2] Introduce panic hypercall

Daniel Gollub <gollub@xxxxxxxxxxxxx> · Mon, 20 Jun 2011 18:26:15 +0200

On Monday, June 20, 2011 05:45:36 pm Avi Kivity wrote:
> > >  However, I'm not sure I see the gain.  Most enterprisey guests
> > >  already contain in-guest crash dumpers which provide more
> > >  information than a qemu memory dump could, since they know exact
> > >  load addresses etc. and are integrated with crash analysis tools.
> > >  What do you have in mind?

Right kexec/kdump works perfectly already inside the guest. But:

 - in the "field" a lot of people still manage to setup VM guest without
   kexec/kdump properly setup (even though most enterprisey distribution try
   hard to setup this up out-of-the-box .. still people manage to not have
   kexec/kdump loaded once they run into a crash).

 - you don't have to reserve disk space for a crashdump for each guest
   e.g. if you run 4 guests with 60 GB of memory each you would loose
   somehow 4*60 GB space ... just for the (rare) case that each of those
   guest could write a crashdump, uncompressed ...

 - legacy distribution - no or buggy kexec

 - maybe writing a crashdump+reboot with QEMU/libvirt is faster then
   with in-guest kexec/kdump? (haven't tested yet)

 - single place on the VM-host to collect coredumps

> > 
> > Well libvirt can capture a "core" file by doing 'virsh dump $GUESTNAME'.
> > This actually uses the QEMU monitor migration command to capture the
> > entire of QEMU memory. The 'crash' command line tool actually knows
> > how to analyse this data format as it would a normal kernel crashdump.
> 
> Interesting.

Right. I'm using the kvmdump support of the crash utility now and then ... it 
could be more often. But unfortunately the people who run KVM in a productive 
environment with some strict service-level-agreement often just reboot, due to 
time pressure, or run out of disk space in the guest, or just forgot that they 
got told to do always "virsh dump" on a freeze or crash.

> 
> > I think having a way for a guest OS to notify the host that is has
> > crashed would be useful. libvirt could automatically do a crash
> > dump of the QEMU memory, or at least pause the guest CPUs and notify
> > the management app of the crash, which can then decide what todo.
> > You can also use tools like 'virt-dmesg' which uses libvirt to peek
> > into guest memory to extract the most recent kernel dmesg logs (even
> > if the guest OS itself is crashed&  didn't manage to send them out
> > via netconsole or something else).
> 
> I agree.  But let's do this via a device, this way kvm need not be changed.

Is a device reliable enough if the guest kernel crashes?
Do you mean something like a hardware watchdog?

> 
> Do ILO cards / IPMI support something like this?  We could follow their 
> lead in that case.

The only two things which came to my mind are:

 * NMI (aka. ipmitool diag) - already available in qemu/kvm - but requires
   in-guest kexec/kdump
 * Hardware-Watchdog (also available in qemu/libvirt)

lguest and xen have something similar. They also have an hypercall which get 
called by a function registered in the panic_notifier_list. Not quite sure if 
you want to follow their lead.

Something I forgot to mention: This panic hypercall could also sit within an 
external kernel module ... to support (legacy) distribution.

> 
> > This series does need to introduce a QMP event notification upon
> > crash, so that the crash notification can be propagated to mgmt
> > layers above QEMU.
> 
> Yes.

Already done. I posted the QEMU relevant changes as a separated series to the 
KVM list ... since the initial implementation is KVM specific (KVM hypercall)

Best Regards,
Daniel

-- 
Daniel Gollub
Linux Consultant & Developer
Tel.: +49-160 47 73 970 
Mail: gollub@xxxxxxxxxxxxx

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
Attachment:
signature.asc

Description: This is a digitally signed message part.