[Xen-devel] [PATCHv10 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

andrew.cooper3@xxxxxxxxxx (Andrew Cooper) · Fri, 8 Nov 2013 14:01:28 +0000

On 08/11/13 13:19, Jan Beulich wrote:
>>>> On 08.11.13 at 14:13, David Vrabel <david.vrabel at citrix.com> wrote:
>> Keir,
>>
>> Sorry, forgot to CC you on this series.
>>
>> Can we have your opinion on whether this kexec series can be merged?
>> And if not, what further work and/or testing is required?
> Just to clarify - unless I missed something, there was still no
> review of this from Daniel or someone else known to be
> familiar with the subject. If Keir gave his ack, formally this
> could go in, but I wouldn't feel too well with that (the more
> that apart from not having reviewed it, Daniel seems to also
> continue to have problems with it).
>
> Jan

Can I have myself deemed to be familiar with the subject as far as this
is concerned?

A noticeable quantity of my contributions to Xen have been in the kexec
/ crash areas, and I am the author of the xen-crashdump-analyser.

I do realise that I certainly not impartial as far as this series is
concerned, being a co-developer.

Davids statement of "the current implementation is so broken[1] and
useless[2] that..." is completely accurate.  It is frankly a miracle
that the current code ever worked at all (and from XenServers point of
view, failed far more often than it worked).

For reference, XenServer 6.2 shipped with approximately v7 of this
series, and an appropriate kexec-tools and xen-crashdump-analyser. 
Since we put the code in, we have not had a single failure-to-kexec in
automated testing (both specific crash tests, and from unexpected host
crashes), whereas we were seeing reliable failures to crash on most of
our test infrastructure.

In stark contrast to previous versions of XenServer, we have not had a
single customer reported host crash where the kexec path has failed. 
There was one systematic failure where the HPSA driver was unhappy with
the state of the hardware, resulting in no root filesystem to write logs
to, and a repeated panic and Xen deadlock in the queued invalidation
codepath.

~Andrew