On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote: > On 2023-09-06 07:10, Takashi Iwai wrote: > > On Wed, 06 Sep 2023 00:01:01 +0200, > > Antonio Terceiro wrote: > > > > > > Hi, > > > > > > I'm using an arm64 workstation, and wanted to add a sound card to it. I bought > > > one who was pretty popular around where I live, and it is supported by the > > > snd-cmipci driver. > > > > > > It's this one: > > > > > > 0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10) > > > > > > After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64 > > > with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle > > > kernel paging request at virtual address fffffbfffe80000c", and the system > > > never finishes to boot. The login manager never shows up and the serial console > > > never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel, > > > after rebuilding with CONFIG_SND_CMIPCI=m. > > > > > > If I stop the module from being automatically loaded by adding > > > `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I > > > remove the card from the PCIe slot), I get the system to boot. But tring > > > to load the module manually causes the same crash (I only tested this > > > with the card on): > > > > > > [ +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree > > > [ +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c > > > [ +0,007927] Mem abort info: > > > [ +0,002793] ESR = 0x0000000096000006 > > > [ +0,003743] EC = 0x25: DABT (current EL), IL = 32 bits > > > [ +0,005307] SET = 0, FnV = 0 > > > [ +0,003049] EA = 0, S1PTW = 0 > > > [ +0,003134] FSC = 0x06: level 2 translation fault > > > [ +0,004872] Data abort info: > > > [ +0,002873] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 > > > [ +0,005479] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > [ +0,005047] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > > [ +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000 > > > [ +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000 > > > [ +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP > > > [ +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid > > 456 async_raid6_recov async_memcpy > > > [ +0,000142] async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core > > > [ +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2 > > > [ +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022 > > > [ +0,012506] Workqueue: events work_for_cpu_fn > > > [ +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > > [ +0,006953] pc : logic_inl+0xa0/0xd8 > > > [ +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci] > > > [ +0,005578] sp : ffff80008287bc70 > > > [ +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000 > > > [ +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001 > > > [ +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080 > > > [ +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff > > > [ +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80 > > > [ +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000 > > > [ +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18 > > > [ +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff > > > [ +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff > > > [ +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c > > > [ +0,007126] Call trace: > > > [ +0,002436] logic_inl+0xa0/0xd8 > > > [ +0,003221] local_pci_probe+0x48/0xb8 > > > [ +0,003744] work_for_cpu_fn+0x24/0x40 > > > [ +0,003741] process_one_work+0x170/0x3a8 > > > [ +0,004002] worker_thread+0x23c/0x460 > > > [ +0,003742] kthread+0xe8/0xf8 > > > [ +0,003047] ret_from_fork+0x10/0x20 > > > [ +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000) > > > [ +0,006083] ---[ end trace 0000000000000000 ]--- > > > > > > Because this sound card chipset seems to be popular (pretty much all PCI cards > > > I can find to buy locally use that), I'm thinking this might be specific to > > > arm64, otherwise someone would have seen this before. > > > > There is only one change in this driver code itself since 6.5 (commit > > b6ba0aa46138), and judging from the stack trace, it's unrelated with > > your problem. It's more likely a regression in the lower level code, > > e.g. PCI layer or arch/arm64 stuff. > > > > Could you try git bisect? > > Hmm, but has this combination of card and machine *ever* actually worked? That could be it. In trying to find a starting point for the bisection, I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the same way. I didn't go further back. > It's blowing up trying to access PCI I/O space, which has apparently ended > up in the indirect access mechanism without that being configured correctly. > That is definitely an issue down somewhere between the PCI layer and the > system firmware. Does the system even have an I/O space window? Some arm64 > machines don't. I guess we might not have got as far as probing a driver if > the I/O BAR couldn't be assigned at all, but either way something's not gone > right. I'm pretty sure I saw reports of people using PCI GPUs on this machine, but I would need to confirm. What info would I need to gather from the machine in order to figure this out?
Attachment:
signature.asc
Description: PGP signature