On Fri, 2024-09-20 at 00:22 +0800, Zach Wade wrote: > > > Hi Wade, ... ... > This issue was discovered when the hard drive was passed directly to > the > virtual machine and the driver was automatically loaded. > > The virtual machine executes lsblk internally and can see the > physical > hard disk I am directly using: > lsblk -S > NAME HCTL TYPE VENDOR MODEL REV > SERIAL > TRAN > sda 32:0:0:0 disk ATA SAMSUNG MZ7LH240HAHQ-00005 HXT7904Q > S45RNC0T166451 > sdb 32:0:1:0 disk ATA INTEL SSDSC2KG480G8 XCV10100 > BTYG84910BR5480BGN > So seems these devices are using same PCI Intel vendor device ID as SST device after emulation. These are not SST devices. What is sudo lspci -vvv Also cat /proc/cpuinfo? Thanks, Srinivas > > > I don't think lspci in VM will show this device. > > Can you send lspci -k? > > lspci -k > 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX > Host > bridge (rev 01) > Subsystem: VMware Virtual Machine Chipset > 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP > bridge (rev 01) > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev > 08) > Subsystem: VMware Virtual Machine Chipset > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev > 01) > Subsystem: VMware Virtual Machine Chipset > Kernel driver in use: ata_piix > Kernel modules: pata_acpi, ata_generic > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) > Subsystem: VMware Virtual Machine Chipset > Kernel modules: i2c_piix4 > 00:07.7 System peripheral: VMware Virtual Machine Communication > Interface (rev 10) > Subsystem: VMware Virtual Machine Communication Interface > Kernel driver in use: vmw_vmci > Kernel modules: vmw_vmci > 00:0f.0 VGA compatible controller: VMware SVGA II Adapter > Subsystem: VMware SVGA II Adapter > Kernel driver in use: vmwgfx > Kernel modules: vmwgfx > 02:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI > Controller > (rev 02) > DeviceName: SCSI0 > Subsystem: VMware PVSCSI SCSI Controller > Kernel driver in use: vmw_pvscsi > Kernel modules: vmw_pvscsi > 02:01.0 USB controller: VMware USB1.1 UHCI Controller > DeviceName: usb > Subsystem: VMware Device 1976 > Kernel driver in use: uhci_hcd > 02:02.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev > 01) > DeviceName: Ethernet0 > Subsystem: VMware VMXNET3 Ethernet Controller > Kernel driver in use: vmxnet3 > Kernel modules: vmxnet3 > 02:03.0 USB controller: VMware USB2 EHCI Controller > DeviceName: ehci > Subsystem: VMware USB2 EHCI Controller > Kernel driver in use: ehci-pci > 02:04.0 SATA controller: VMware SATA AHCI controller > DeviceName: sata0 > Subsystem: VMware SATA AHCI controller > Kernel driver in use: ahci > > > This is not complete list. > > I want to make sure somehow your other VM PCI device is using same > > ID > > as this device. > > > > Are you referring to transferring this hard drive to another virtual > machine? The action is too big, unfortunately I cannot do it this > way. > This physical hard drive is only directly connected to this virtual > machine for use. > > > > > > > > Here on the virtualized environment, seems the > > > > > topology_physical_package_id() (from CPU APIC ID in non > > > > > virtualized > > > > > case) is assigned some big value, which is more than max > > > > > packages > > > > > in > > > > > the system. > > > > > > > > > > But your fix is good as topology_logical_package_id() should > > > > > be > > > > > less > > > > > than value returned by topology_max_packages() and hence > > > > > avoid > > > > > this > > > > > issue. > > > > > > > > > > Can you confirm the value returned by > > > > > topology_logical_package_id() and > > > > > topology_physical_package_id()? > > > > > > cat /proc/cpuinfo | grep "physical id" > > > physical id : 0 > > > physical id : 2 > > > physical id : 4 > > > ...... > > > physical id : 58 > > > physical id : 60 > > > physical id : 62 > > > > > > > > I calculated topology_max_packages() * sizeof (* isst_pkg.info) > > > in > > > isst_if_cpu_info_init, and focused on pkg_id and bus_no in > > > _isst_if_get_pci_dev. > > > The printk printed result is as follows: > > > [ 51.879700] Allocated size: 512 > > > > Here topology_max_packages() returned 32. > > yes. > > > > > > [ 51.880148] pkg_id: 0, bus_no: 0 > > > [ 51.881242] pkg_id: 0, bus_no: 1 > > > [ 51.884209] pkg_id: 2, bus_no: 0 > > > [ 51.884571] pkg_id: 2, bus_no: 1 > > > [ 51.884931] pkg_id: 4, bus_no: 0 > > > [ 51.885313] pkg_id: 4, bus_no: 1 > > > ...... > > > [ 51.899134] pkg_id: 28, bus_no: 0 > > > [ 51.899511] pkg_id: 28, bus_no: 1 > > > [ 51.899909] pkg_id: 30, bus_no: 0 > > > [ 51.901012] pkg_id: 30, bus_no: 1 > > > [ 51.902160] > > > ================================================================= > > > = > > > [ 51.902936] BUG: KASAN: slab-out-of-bounds in > > > _isst_if_get_pci_dev.cold+0xde/0xe4 [isst_if_common] > > > [ 51.982707] > > > ================================================================= > > > = > > Package ID is 32, so it will overflow. There seems to be only 32 > > packages. > > If you print topology_logical_package_id(), you will have no gaps, > > and > > will be 0-31. > > > > Can you also print topology_logical_package_id() to confirm. > > After adding printk to _isst_if_get_pci_dev, the changes in pkd_id > can > be seen as follows: > [ 18.078652] pkg_id:0 > [ 18.078669] pkg_id:0 > [ 18.079215] pkg_id:1 > [ 18.080920] pkg_id:1 > [ 18.081847] pkg_id:2 > [ 18.082756] pkg_id:2 > [ 18.088928] pkg_id:3 > [ 18.089839] pkg_id:3 > ...... > [ 18.185462] pkg_id:30 > [ 18.185471] pkg_id:30 > [ 18.185561] pkg_id:31 > [ 18.185569] pkg_id:31 > > > > > topology_max_packages() returns max __max_logical_packages, so > > topology_logical_package_id() will be better here. > > > > > [ 51.985453] pkg_id: 32, bus_no: 0 > > > [ 51.986569] pkg_id: 32, bus_no: 1 > > > [ 51.988501] pkg_id: 34, bus_no: 0 > > > [ 51.989616] pkg_id: 34, bus_no: 1 > > > ...... > > > [ 52.059749] pkg_id: 58, bus_no: 0 > > > [ 52.062331] pkg_id: 58, bus_no: 1 > > > [ 52.066039] pkg_id: 60, bus_no: 0 > > > [ 52.068503] pkg_id: 60, bus_no: 1 > > > [ 52.072018] pkg_id: 62, bus_no: 0 > > > [ 52.074375] pkg_id: 62, bus_no: 1 > > > > > > > > > > > > > We can change commit description based on that. > > > > > > > > > > Thanks, > > > > > Srinivas > > > > > > > > > > > I think the changes are minor, so no more content was added to > > > the > > > patch. > > > If you think it needs to be added, I am happy for you to help > > > supplement > > > it. > > I just want to be clear how to reproduce this issue. > > > > " > > Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of- > > bounds". > > > > Then you can add the kasan bug report. > > > > The reason for this error is physical_package_ids assigned by VMM > > have > > holes. This will cause value returned by > > topology_physical_package_id() > > to be more than topology_max_packages(). The allocation uses > > topology_max_packages() to allocate memory. topology_max_packages() > > returns maximum logical package IDs. Hence use > > topology_logical_package_id() instead of > > topology_physical_package_id(). > > " > > Thanks, I will make the necessary modifications in v2. > > > > > Also we should add a check > > > > pkg_id = topology_logical_package_id(cpu); > > if (pkg_id >= topology_max_packages()) > > return NULL; > > > > May be VMM has holes in logical IDs also, then atleast it will not > > cause BUG. > > Great suggestion, I will add it in the new v2. > > Thanks, > Srinivas > > > > > Thanks, > > Srinivas > > > > > > > Thanks, > > > Zach > > > > > > > > after loading the > > > > > isst_if_common and isst_if_mbox_msr modules on the 64 core, > > > > > the > > > > > kasan > > > > > report was triggered. > > > > > After consulting the kernel manual > > > > > (Documentation/arch/x86/topology.rst), > > > > > I think in _isst_if_get_pci_dev, topology_physical_package_id > > > > > should > > > > > be > > > > > replaced with topology_logical_package_id. > > > > > > > > > > kasan bug report: > > > > > [ 19.411889] > > > > > ============================================================= > > > > > ==== > > > > > = > > > > > [ 19.413702] BUG: KASAN: slab-out-of-bounds in > > > > > _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] > > > > > [ 19.415634] Read of size 8 at addr ffff888829e65200 by > > > > > task > > > > > cpuhp/16/113 > > > > > [ 19.417368] > > > > > [ 19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G > > > > > E 6.9.0 #10 > > > > > [ 19.420435] Hardware name: VMware, Inc. VMware20,1/440BX > > > > > Desktop > > > > > Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 > > > > > 07/28/2022 > > > > > [ 19.422687] Call Trace: > > > > > [ 19.424091] <TASK> > > > > > [ 19.425448] dump_stack_lvl+0x5d/0x80 > > > > > [ 19.426963] ? _isst_if_get_pci_dev+0x3d5/0x400 > > > > > [isst_if_common] > > > > > [ 19.428694] print_report+0x19d/0x52e > > > > > [ 19.430206] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > > > > > [ 19.431837] ? _isst_if_get_pci_dev+0x3d5/0x400 > > > > > [isst_if_common] > > > > > [ 19.433539] kasan_report+0xf0/0x170 > > > > > [ 19.435019] ? _isst_if_get_pci_dev+0x3d5/0x400 > > > > > [isst_if_common] > > > > > [ 19.436709] _isst_if_get_pci_dev+0x3d5/0x400 > > > > > [isst_if_common] > > > > > [ 19.438379] ? __pfx_sched_clock_cpu+0x10/0x10 > > > > > [ 19.439910] isst_if_cpu_online+0x406/0x58f > > > > > [isst_if_common] > > > > > [ 19.441573] ? __pfx_isst_if_cpu_online+0x10/0x10 > > > > > [isst_if_common] > > > > > [ 19.443263] ? ttwu_queue_wakelist+0x2c1/0x360 > > > > > [ 19.444797] cpuhp_invoke_callback+0x221/0xec0 > > > > > [ 19.446337] cpuhp_thread_fun+0x21b/0x610 > > > > > [ 19.447814] ? __pfx_cpuhp_thread_fun+0x10/0x10 > > > > > [ 19.449354] smpboot_thread_fn+0x2e7/0x6e0 > > > > > [ 19.450859] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > > > [ 19.452405] kthread+0x29c/0x350 > > > > > [ 19.453817] ? __pfx_kthread+0x10/0x10 > > > > > [ 19.455253] ret_from_fork+0x31/0x70 > > > > > [ 19.456685] ? __pfx_kthread+0x10/0x10 > > > > > [ 19.458114] ret_from_fork_asm+0x1a/0x30 > > > > > [ 19.459573] </TASK> > > > > > [ 19.460853] > > > > > [ 19.462055] Allocated by task 1198: > > > > > [ 19.463410] kasan_save_stack+0x30/0x50 > > > > > [ 19.464788] kasan_save_track+0x14/0x30 > > > > > [ 19.466139] __kasan_kmalloc+0xaa/0xb0 > > > > > [ 19.467465] __kmalloc+0x1cd/0x470 > > > > > [ 19.468748] isst_if_cdev_register+0x1da/0x350 > > > > > [isst_if_common] > > > > > [ 19.470233] isst_if_mbox_init+0x108/0xff0 > > > > > [isst_if_mbox_msr] > > > > > [ 19.471670] do_one_initcall+0xa4/0x380 > > > > > [ 19.472903] do_init_module+0x238/0x760 > > > > > [ 19.474105] load_module+0x5239/0x6f00 > > > > > [ 19.475285] init_module_from_file+0xd1/0x130 > > > > > [ 19.476506] idempotent_init_module+0x23b/0x650 > > > > > [ 19.477725] __x64_sys_finit_module+0xbe/0x130 > > > > > [ 19.476506] idempotent_init_module+0x23b/0x650 > > > > > [ 19.477725] __x64_sys_finit_module+0xbe/0x130 > > > > > [ 19.478920] do_syscall_64+0x82/0x160 > > > > > [ 19.480036] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > [ 19.481292] > > > > > [ 19.482205] The buggy address belongs to the object at > > > > > ffff888829e65000 > > > > > which belongs to the cache kmalloc-512 of size 512 > > > > > [ 19.484818] The buggy address is located 0 bytes to the > > > > > right > > > > > of > > > > > allocated 512-byte region [ffff888829e65000, > > > > > ffff888829e65200) > > > > > [ 19.487447] > > > > > [ 19.488328] The buggy address belongs to the physical > > > > > page: > > > > > [ 19.489569] page: refcount:1 mapcount:0 > > > > > mapping:0000000000000000 > > > > > index:0xffff888829e60c00 pfn:0x829e60 > > > > > [ 19.491140] head: order:3 entire_mapcount:0 > > > > > nr_pages_mapped:0 > > > > > pincount:0 > > > > > [ 19.492466] anon flags: > > > > > 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff) > > > > > [ 19.493914] page_type: 0xffffffff() > > > > > [ 19.494988] raw: 0057ffffc0000840 ffff88810004cc80 > > > > > 0000000000000000 0000000000000001 > > > > > [ 19.496451] raw: ffff888829e60c00 0000000080200018 > > > > > 00000001ffffffff 0000000000000000 > > > > > [ 19.497906] head: 0057ffffc0000840 ffff88810004cc80 > > > > > 0000000000000000 0000000000000001 > > > > > [ 19.499379] head: ffff888829e60c00 0000000080200018 > > > > > 00000001ffffffff 0000000000000000 > > > > > [ 19.500844] head: 0057ffffc0000003 ffffea0020a79801 > > > > > ffffea0020a79848 00000000ffffffff > > > > > [ 19.502316] head: 0000000800000000 0000000000000000 > > > > > 00000000ffffffff 0000000000000000 > > > > > [ 19.503784] page dumped because: kasan: bad access > > > > > detected > > > > > [ 19.505058] > > > > > [ 19.505970] Memory state around the buggy address: > > > > > [ 19.507172] ffff888829e65100: 00 00 00 00 00 00 00 00 00 > > > > > 00 > > > > > 00 00 > > > > > 00 00 00 00 > > > > > [ 19.508599] ffff888829e65180: 00 00 00 00 00 00 00 00 00 > > > > > 00 > > > > > 00 00 > > > > > 00 00 00 00 > > > > > [ 19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc > > > > > fc > > > > > fc fc > > > > > fc fc fc fc > > > > > [ 19.510014] ^ > > > > > [ 19.510016] ffff888829e65280: fc fc fc fc fc fc fc fc fc > > > > > fc > > > > > fc fc > > > > > fc fc fc fc > > > > > [ 19.510018] ffff888829e65300: fc fc fc fc fc fc fc fc fc > > > > > fc > > > > > fc fc > > > > > fc fc fc fc > > > > > [ 19.515367] > > > > > ============================================================= > > > > > ==== > > > > > = > > > > > > > > > > Fixes: 9a1aac8a96dc ("platform/x86: ISST: PUNIT device > > > > > mapping > > > > > with > > > > > Sub-NUMA clustering") > > > > > Signed-off-by: Zach Wade <zachwade.k@xxxxxxxxx> > > > > > --- > > > > > drivers/platform/x86/intel/speed_select_if/isst_if_common. > > > > > c | 2 > > > > > +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git > > > > > a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c > > > > > b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c > > > > > index 10e21563fa46..80654aacd5bd 100644 > > > > > --- > > > > > a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c > > > > > +++ > > > > > b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c > > > > > @@ -316,7 +316,7 @@ static struct pci_dev > > > > > *_isst_if_get_pci_dev(int > > > > > cpu, int bus_no, int dev, int fn > > > > > cpu >= nr_cpu_ids || cpu >= > > > > > num_possible_cpus()) > > > > > return NULL; > > > > > > > > > > - pkg_id = topology_physical_package_id(cpu); > > > > > + pkg_id = topology_logical_package_id(cpu); > > > > > > > > > > bus_number = isst_cpu_info[cpu].bus_info[bus_no]; > > > > > if (bus_number < 0) > > > > > > > > > >