From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Sent: Thursday, January 6, 2022 1:46 AM > > KASAN detected the following issue: > > BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060 > Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33 > > CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1 > Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, > BIOS Hyper-V UEFI Release v4.0 12/17/2019 > Call Trace: > dump_stack_lvl+0x57/0x7d > print_address_description.constprop.0+0x1f/0x140 > ? hyperv_flush_tlb_multi+0xf88/0x1060 > __kasan_report.cold+0x7f/0x11e > ? hyperv_flush_tlb_multi+0xf88/0x1060 > kasan_report+0x38/0x50 > hyperv_flush_tlb_multi+0xf88/0x1060 > flush_tlb_mm_range+0x1b1/0x200 > ptep_clear_flush+0x10e/0x150 > ... > Allocated by task 0: > kasan_save_stack+0x1b/0x40 > __kasan_kmalloc+0x7c/0x90 > hv_common_init+0xae/0x115 > hyperv_init+0x97/0x501 > apic_intr_mode_init+0xb3/0x1e0 > x86_late_time_init+0x92/0xa2 > start_kernel+0x338/0x3eb > secondary_startup_64_no_verify+0xc2/0xcb > > The buggy address belongs to the object at ffff8880011cc800 > which belongs to the cache kmalloc-1k of size 1024 > The buggy address is located 960 bytes inside of > 1024-byte region [ffff8880011cc800, ffff8880011ccc00) > > 'hyperv_flush_tlb_multi+0xf88/0x1060' points to > hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get > VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to > access past hv_vp_index's last element. This can (and will) happen > when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'. > > Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has > been disabled") tried to deal with empty cpumask situation but > apparently didn't fully fix the issue. > > 'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is > 'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every > time the particular mm is scheduled/unscheduled on some CPU (see > switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote > TLB flush has zero influence on whether the particular process can get > scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was > scheduled on one other CPU and got unscheduled during > hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty. > > It doesn't seem that there's a good way to protect 'mm_cpumask(mm)' > from changing during hyperv_flush_tlb_multi()'s execution. It would be > possible to copy it in the very beginning of the function but this is a > waste. It seems we can deal with changing cpumask just fine. > > When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s > execution, there are two possible issues: > - 'Under-flushing': we will not flush TLB on a CPU which got added to > the mask while hyperv_flush_tlb_multi() was already running. This is > not a problem as this is equal to mm getting scheduled on that CPU > right after TLB flush. > - 'Over-flushing': we may flush TLB on a CPU which is already cleared > from the mask. First, extra TLB flush preserves correctness. Second, > Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may > avoid the flush if CR3 doesn't match. > > Fix the immediate issue with > cpumask_last()/hv_cpu_number_to_vp_number() > and remove the pointless cpumask_empty() check from the beginning of the > function as it really doesn't protect anything. Also, avoid the hypercall > altogether when 'flush->processor_mask' ends up being empty. > > Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled") > Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > --- > arch/x86/hyperv/mmu.c | 19 +++++++++---------- > 1 file changed, 9 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index bd13736d0c05..0ad2378fe6ad 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -68,15 +68,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus, > > local_irq_save(flags); > > - /* > - * Only check the mask _after_ interrupt has been disabled to avoid the > - * mask changing under our feet. > - */ > - if (cpumask_empty(cpus)) { > - local_irq_restore(flags); > - return; > - } > - > flush_pcpu = (struct hv_tlb_flush **) > this_cpu_ptr(hyperv_pcpu_input_arg); > > @@ -115,7 +106,9 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus, > * must. We will also check all VP numbers when walking the > * supplied CPU set to remain correct in all cases. > */ > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > + cpu = cpumask_last(cpus); > + > + if (cpu < nr_cpumask_bits && hv_cpu_number_to_vp_number(cpu) >= 64) > goto do_ex_hypercall; > > for_each_cpu(cpu, cpus) { > @@ -131,6 +124,12 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus, > __set_bit(vcpu, (unsigned long *) > &flush->processor_mask); > } > + > + /* nothing to flush if 'processor_mask' ends up being empty */ > + if (!flush->processor_mask) { > + local_irq_restore(flags); > + return; > + } > } > > /* > -- > 2.33.1 Thanks for figuring out the core issue with the cpumask changing and doing what should be the definitive solution! Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>