Re: snd-cmipci oops during probe on arm64 (current mainline, pre-6.6-rc1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 06, 2023 at 01:49:16PM +0100, Robin Murphy wrote:
> On 2023-09-06 07:10, Takashi Iwai wrote:
> > On Wed, 06 Sep 2023 00:01:01 +0200,
> > Antonio Terceiro wrote:
> > > 
> > > Hi,
> > > 
> > > I'm using an arm64 workstation, and wanted to add a sound card to it. I bought
> > > one who was pretty popular around where I live, and it is supported by the
> > > snd-cmipci driver.
> > > 
> > > It's this one:
> > > 
> > > 0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
> > > 
> > > After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64
> > > with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle
> > > kernel paging request at virtual address fffffbfffe80000c", and the system
> > > never finishes to boot. The login manager never shows up and the serial console
> > > never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel,
> > > after rebuilding with CONFIG_SND_CMIPCI=m.
> > > 
> > > If I stop the module from being automatically loaded by adding
> > > `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I
> > > remove the card from the PCIe slot), I get the system to boot. But tring
> > > to load the module manually causes the same crash (I only tested this
> > > with the card on):
> > > 
> > > [  +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree
> > > [  +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c
> > > [  +0,007927] Mem abort info:
> > > [  +0,002793]   ESR = 0x0000000096000006
> > > [  +0,003743]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [  +0,005307]   SET = 0, FnV = 0
> > > [  +0,003049]   EA = 0, S1PTW = 0
> > > [  +0,003134]   FSC = 0x06: level 2 translation fault
> > > [  +0,004872] Data abort info:
> > > [  +0,002873]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
> > > [  +0,005479]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > [  +0,005047]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > [  +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000
> > > [  +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000
> > > [  +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP
> > > [  +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
> >   456 async_raid6_recov async_memcpy
> > > [  +0,000142]  async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core
> > > [  +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2
> > > [  +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> > > [  +0,012506] Workqueue: events work_for_cpu_fn
> > > [  +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [  +0,006953] pc : logic_inl+0xa0/0xd8
> > > [  +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci]
> > > [  +0,005578] sp : ffff80008287bc70
> > > [  +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000
> > > [  +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001
> > > [  +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080
> > > [  +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff
> > > [  +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80
> > > [  +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000
> > > [  +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18
> > > [  +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff
> > > [  +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff
> > > [  +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c
> > > [  +0,007126] Call trace:
> > > [  +0,002436]  logic_inl+0xa0/0xd8
> > > [  +0,003221]  local_pci_probe+0x48/0xb8
> > > [  +0,003744]  work_for_cpu_fn+0x24/0x40
> > > [  +0,003741]  process_one_work+0x170/0x3a8
> > > [  +0,004002]  worker_thread+0x23c/0x460
> > > [  +0,003742]  kthread+0xe8/0xf8
> > > [  +0,003047]  ret_from_fork+0x10/0x20
> > > [  +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000)
> > > [  +0,006083] ---[ end trace 0000000000000000 ]---
> > > 
> > > Because this sound card chipset seems to be popular (pretty much all PCI cards
> > > I can find to buy locally use that), I'm thinking this might be specific to
> > > arm64, otherwise someone would have seen this before.
> > 
> > There is only one change in this driver code itself since 6.5 (commit
> > b6ba0aa46138), and judging from the stack trace, it's unrelated with
> > your problem.   It's more likely a regression in the lower level code,
> > e.g. PCI layer or arch/arm64 stuff.
> > 
> > Could you try git bisect?
> 
> Hmm, but has this combination of card and machine *ever* actually worked?

That could be it. In trying to find a starting point for the bisection,
I tried 6.1.0, 5.15.130, and 5.10.19, and they all fail in exactly the
same way. I didn't go further back.

> It's blowing up trying to access PCI I/O space, which has apparently ended
> up in the indirect access mechanism without that being configured correctly.
> That is definitely an issue down somewhere between the PCI layer and the
> system firmware. Does the system even have an I/O space window? Some arm64
> machines don't. I guess we might not have got as far as probing a driver if
> the I/O BAR couldn't be assigned at all, but either way something's not gone
> right.

I'm pretty sure I saw reports of people using PCI GPUs on this machine,
but I would need to confirm.

What info would I need to gather from the machine in order to figure
this out?

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [ALSA User]     [Linux Audio Users]     [Pulse Audio]     [Kernel Archive]     [Asterisk PBX]     [Photo Sharing]     [Linux Sound]     [Video 4 Linux]     [Gimp]     [Yosemite News]

  Powered by Linux