On Tue, Jul 16, 2019 at 11:14:37PM +0800, Zenghui Yu wrote: > > On 2019/7/16 23:05, Zenghui Yu wrote: > > Hi folks, > > > > Running the latest kernel with KASAN enabled, we will hit the following > > KASAN BUG during guest's boot process. > > > > I'm in commit 9637d517347e80ee2fe1c5d8ce45ba1b88d8b5cd. > > > > Any problems in the chained PMU code? Or just a false positive? > > > > ---8<--- > > > > [ 654.706268] > > ================================================================== > > [ 654.706280] BUG: KASAN: slab-out-of-bounds in > > kvm_pmu_get_canonical_pmc+0x48/0x78 > > [ 654.706286] Read of size 8 at addr ffff801d6c8fea38 by task > > qemu-kvm/23268 > > > > [ 654.706296] CPU: 2 PID: 23268 Comm: qemu-kvm Not tainted 5.2.0+ #178 > > [ 654.706301] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 > > 10/24/2018 > > [ 654.706305] Call trace: > > [ 654.706311] dump_backtrace+0x0/0x238 > > [ 654.706317] show_stack+0x24/0x30 > > [ 654.706325] dump_stack+0xe0/0x134 > > [ 654.706332] print_address_description+0x80/0x408 > > [ 654.706338] __kasan_report+0x164/0x1a0 > > [ 654.706343] kasan_report+0xc/0x18 > > [ 654.706348] __asan_load8+0x88/0xb0 > > [ 654.706353] kvm_pmu_get_canonical_pmc+0x48/0x78 > > I noticed that we will use "pmc->idx" and the "chained" bitmap to > determine if the pmc is chained, in kvm_pmu_pmc_is_chained(). > > Should we initialize the idx and the bitmap appropriately before > doing kvm_pmu_stop_counter()? Like: Hi Zenghui, Thanks for spotting this and investigating - I'll make sure to use KASAN in the future when testing... > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > index 3dd8238..cf3119a 100644 > --- a/virt/kvm/arm/pmu.c > +++ b/virt/kvm/arm/pmu.c > @@ -224,12 +224,12 @@ void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) > int i; > struct kvm_pmu *pmu = &vcpu->arch.pmu; > > + bitmap_zero(vcpu->arch.pmu.chained, ARMV8_PMU_MAX_COUNTER_PAIRS); > + > for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { > - kvm_pmu_stop_counter(vcpu, &pmu->pmc[i]); > pmu->pmc[i].idx = i; > + kvm_pmu_stop_counter(vcpu, &pmu->pmc[i]); > } > - > - bitmap_zero(vcpu->arch.pmu.chained, ARMV8_PMU_MAX_COUNTER_PAIRS); > } We have to be a little careful here, as the vcpu may be reset after use. Upon resetting we must ensure that any existing perf_events are released - this is why kvm_pmu_stop_counter is called before bitmap_zero (as kvm_pmu_stop_counter relies on kvm_pmu_pmc_is_chained). (For example, by clearing the bitmap before stopping the counters, we will attempt to release the perf event for both pmc's in a chained pair. Whereas we should only release the canonical pmc. It's actually OK right now as we set the non-canonical pmc perf_event will be NULL - but who knows that this will hold true in the future. The code makes the assumption that the non-canonical perf event isn't touched on a chained pair). The KASAN bug gets fixed by moving the assignment of idx before kvm_pmu_stop_counter. Therefore I'd suggest you drop the bitmap_zero hunks. Can you send a patch with just the idx assignment hunk please? Thanks, Andrew Murray > > /** > > > Thanks, > zenghui > > > [ 654.706358] kvm_pmu_stop_counter+0x28/0x118 > > [ 654.706363] kvm_pmu_vcpu_reset+0x60/0xa8 > > [ 654.706369] kvm_reset_vcpu+0x30/0x4d8 > > [ 654.706376] kvm_arch_vcpu_ioctl+0xa04/0xc18 > > [ 654.706381] kvm_vcpu_ioctl+0x17c/0xde8 > > [ 654.706387] do_vfs_ioctl+0x150/0xaf8 > > [ 654.706392] ksys_ioctl+0x84/0xb8 > > [ 654.706397] __arm64_sys_ioctl+0x4c/0x60 > > [ 654.706403] el0_svc_common.constprop.0+0xb4/0x208 > > [ 654.706409] el0_svc_handler+0x3c/0xa8 > > [ 654.706414] el0_svc+0x8/0xc > > > > [ 654.706422] Allocated by task 23268: > > [ 654.706429] __kasan_kmalloc.isra.0+0xd0/0x180 > > [ 654.706435] kasan_slab_alloc+0x14/0x20 > > [ 654.706440] kmem_cache_alloc+0x17c/0x4a8 > > [ 654.706445] kvm_arch_vcpu_create+0xa0/0x130 > > [ 654.706451] kvm_vm_ioctl+0x844/0x1218 > > [ 654.706456] do_vfs_ioctl+0x150/0xaf8 > > [ 654.706461] ksys_ioctl+0x84/0xb8 > > [ 654.706466] __arm64_sys_ioctl+0x4c/0x60 > > [ 654.706472] el0_svc_common.constprop.0+0xb4/0x208 > > [ 654.706478] el0_svc_handler+0x3c/0xa8 > > [ 654.706482] el0_svc+0x8/0xc > > > > [ 654.706490] Freed by task 0: > > [ 654.706493] (stack is not available) > > > > [ 654.706501] The buggy address belongs to the object at ffff801d6c8fc010 > > which belongs to the cache kvm_vcpu of size 10784 > > [ 654.706507] The buggy address is located 8 bytes to the right of > > 10784-byte region [ffff801d6c8fc010, ffff801d6c8fea30) > > [ 654.706510] The buggy address belongs to the page: > > [ 654.706516] page:ffff7e0075b23f00 refcount:1 mapcount:0 > > mapping:ffff801db257e480 index:0x0 compound_mapcount: 0 > > [ 654.706524] flags: 0xffffe0000010200(slab|head) > > [ 654.706532] raw: 0ffffe0000010200 ffff801db2586ee0 ffff801db2586ee0 > > ffff801db257e480 > > [ 654.706538] raw: 0000000000000000 0000000000010001 00000001ffffffff > > 0000000000000000 > > [ 654.706542] page dumped because: kasan: bad access detected > > > > [ 654.706549] Memory state around the buggy address: > > [ 654.706554] ffff801d6c8fe900: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 > > [ 654.706560] ffff801d6c8fe980: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 > > [ 654.706565] >ffff801d6c8fea00: 00 00 00 00 00 00 fc fc fc fc fc fc fc > > fc fc fc > > [ 654.706568] ^ > > [ 654.706573] ffff801d6c8fea80: fc fc fc fc fc fc fc fc fc fc fc fc fc > > fc fc fc > > [ 654.706578] ffff801d6c8feb00: fc fc fc fc fc fc fc fc fc fc fc fc fc > > fc fc fc > > [ 654.706582] > > ================================================================== >