[PATCH v8 1/2] x86/kexec: add a new atomic notifier list for kdump

zhangyanfei@xxxxxxxxxxxxxx (Zhang Yanfei) · Tue, 27 Nov 2012 09:32:57 +0800

? 2012?11?27? 02:18, Eric W. Biederman ??:
> Gleb Natapov <gleb at redhat.com> writes:
> 
>> On Mon, Nov 26, 2012 at 11:43:10AM -0600, Eric W. Biederman wrote:
>>> Gleb Natapov <gleb at redhat.com> writes:
>>>
>>>> On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
>>>>> Zhang Yanfei <zhangyanfei at cn.fujitsu.com> writes:
>>>>>
>>>>>> This patch adds an atomic notifier list named crash_notifier_list.
>>>>>> Currently, when loading kvm-intel module, a notifier will be registered
>>>>>> in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
>>>>>> needed.
>>>>>
>>>>> crash_notifier_list ick gag please no.  Effectively this makes the kexec
>>>>> on panic code path undebuggable.
>>>>>
>>>>> Instead we need to use direct function calls to whatever you are doing.
>>>>>
>>>> The code walks linked list in kvm-intel module and calls vmclear on
>>>> whatever it finds there. Since the function have to resides in kvm-intel
>>>> module it cannot be called directly. Is callback pointer that is set
>>>> by kvm-intel more acceptable?
>>>
>>> Yes a specific callback function is more acceptable.  Looking a little
>>> deeper vmclear_local_loaded_vmcss is not particularly acceptable. It is
>>> doing a lot of work that is unnecessary to save the virtual registers
>>> on the kexec on panic path.
>>>
>> What work are you referring to in particular that may not be
>> acceptable?
> 
> The unnecessary work that I was see is all of the software state
> changing.  Unlinking things from linked lists flipping variables.
> None of that appears related to the fundamental issue saving cpu
> state.
> 
> Simply reusing a function that does more than what is strictly required
> makes me nervous.  What is the chance that the function will grow
> with maintenance and add constructs that are not safe in a kexec on
> panic situtation.

So in summary,

1. a specific callback function instead of a notifier?

2. Instead of calling vmclear_local_loaded_vmcss, the vmclear operation
   will just call the vmclear on every vmcss loaded on the cpu?

   like below:

   static void crash_vmclear_local_loaded_vmcss(void)
   {
        int cpu = raw_smp_processor_id();
        struct loaded_vmcs *v, *n;

        if (!crash_local_vmclear_enabled(cpu))
                return;

        list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
                                 loaded_vmcss_on_cpu_link)
                vmcs_clear(v->vmcs);
   }

   right?

Thanks
Zhang

> 
>>> In fact I wonder if it might not just be easier to call vmcs_clear to a
>>> fixed per cpu buffer.
>>>
>> There may be more than one vmcs loaded on a cpu, hence the list.
>>
>>> Performing list walking in interrupt context without locking in
>>> vmclear_local_loaded vmcss looks a bit scary.  Not that locking would
>>> make it any better, as locking would simply add one more way to deadlock
>>> the system.  Only an rcu list walk is at all safe.  A list walk that
>>> modifies the list as vmclear_local_loaded_vmcss does is definitely not safe.
>>>
>> The list vmclear_local_loaded walks is per cpu. Zhang's kvm patch
>> disables kexec callback while list is modified.
> 
> If the list is only modified on it's cpu and we are running on that cpu
> that does look like it will give the necessary protections.  It isn't
> particularly clear at first glance that is the case unfortunately.
> 
> Eric
>