On 04/01/2019 02:38 AM, Juergen Gross wrote: > On 25/03/2019 19:03, Waiman Long wrote: >> On 03/25/2019 12:40 PM, Juergen Gross wrote: >>> On 25/03/2019 16:57, Waiman Long wrote: >>>> It was found that passing an invalid cpu number to pv_vcpu_is_preempted() >>>> might panic the kernel in a VM guest. For example, >>>> >>>> [ 2.531077] Oops: 0000 [#1] SMP PTI >>>> : >>>> [ 2.532545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >>>> [ 2.533321] RIP: 0010:__raw_callee_save___kvm_vcpu_is_preempted+0x0/0x20 >>>> >>>> To guard against this kind of kernel panic, check is added to >>>> pv_vcpu_is_preempted() to make sure that no invalid cpu number will >>>> be used. >>>> >>>> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> >>>> --- >>>> arch/x86/include/asm/paravirt.h | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h >>>> index c25c38a05c1c..4cfb465dcde4 100644 >>>> --- a/arch/x86/include/asm/paravirt.h >>>> +++ b/arch/x86/include/asm/paravirt.h >>>> @@ -671,6 +671,12 @@ static __always_inline void pv_kick(int cpu) >>>> >>>> static __always_inline bool pv_vcpu_is_preempted(long cpu) >>>> { >>>> + /* >>>> + * Guard against invalid cpu number or the kernel might panic. >>>> + */ >>>> + if (WARN_ON_ONCE((unsigned long)cpu >= nr_cpu_ids)) >>>> + return false; >>>> + >>>> return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu); >>>> } >>> Can this really happen without being a programming error? >> This shouldn't happen without a programming error, I think. In my case, >> it was caused by a race condition leading to use-after-free of the cpu >> number. However, my point is that error like that shouldn't cause the >> kernel to panic. >> >>> Basically you'd need to guard all percpu area accesses to foreign cpus >>> this way. Why is this one special? >> It depends. If out-of-bound access can only happen with obvious >> programming error, I don't think we need to guard against them. In this >> case, I am not totally sure if the race condition that I found may >> happen with existing code or not. To be prudent, I decide to send this >> patch out. >> >> The race condition that I am looking at is as follows: >> >> CPU 0 CPU 1 >> ----- ----- >> up_write: >> owner = NULL; >> <release-barrier> >> count = 0; >> >> <rcu-free task structure> >> >> rwsem_can_spin_on_owner: >> rcu_read_lock(); >> read owner; >> : >> vcpu_is_preempted(owner->cpu); >> : >> rcu_read_unlock() >> >> When I tried to merge the owner into the count (clear the owner after >> the barrier), I can reproduce the crash 100% when booting up the kernel >> in a VM guest. However, I am not sure if the configuration above is safe >> and is just very hard to reproduce. >> >> Alternatively, I can also do the cpu check before calling >> vcpu_is_preempted(). > I think I'd prefer that. > > > Juergen > It turns out that it may be caused by a software bug after all. You can ignore this patch for now. Thanks, Longman _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization