Hi Pierre, >Hi Libin, > >>>>>> Please let me describe the issue here. >>>>>> >>>>>> The test case is: >>>>>> 1) Unload module with script "sudo ./sof_remove.sh" , >>>>>> 2) reload module with script "sudo ./sof_insert.sh" >>>>>> >>>>>> After several rounds of removing and inserting kernel modules, >>>>>> system will complain like below: >>>>>> "BUG: unable to handle kernel paging request at 000000292a282031" >>>>> >>>>> Did you try some kernel debug options? It might show what went wrong. >>>> >>>> No, I haven't. I'm not sure which options I can use for this case. >>>> Could you please give me some suggestions? >>> >>> You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option >>> for showing each devres allocation and removal. And I'd try >>> CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting >>> in CONFIG_DEBUG section, too. >> >> Thanks for your suggestion. After more than 1 week debug, I think >> maybe I have root caused this issue from the devres.log message. >> >> Below is my finding. >> 1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be >> called, and it will set hdev->dev.release = default_release. >> However, for analog codec (not hdac_hdmi codec), >> hdac_hda_codec_probe() will be called later. And it will call >> snd_hda_codec_device_new(), which will reset codec->code.dev.release = >> snd_hda_codec_dev_release; This means hdac_hdmi's hdev dev release is >> default_release() defined in hdac_ext_bus.c, and other hda codec's >> hdev dev release is snd_hda_codec_dev_release(). >> >> Both default_release() and snd_hda_codec_dev_release() will call >> kfree() to free the hdac_device (or its container) in the current code. >> >> 2. When we run rmmod sof_pci_dev, it will free the sdev. If we use >> Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev >> will also be freed automatically. >> >> In the removal, snd_hdac_ext_bus_device_remove() will be called to >> remove the hdev (in this function it is struct hdac_device *codec. >> The name is not aligned in different places). >> However for hdac_hdmi, the hdev->dev is used by other code. > >what other code? Can you elaborate on why the release is delayed? >From the dmesg, it is the device_link used for hdac_hdmi that causes the release delay. Device_link will increase the refcount. > >> So calling device.release() (the function default_release()) will be >> postponed. After after sdev is freed, the device.release() will be >> called. But for devm_xxx, hdev will also be freed when sdev is freed. >> This means hdev.dev after sdev is freed is invalid now as hdev has >> already freed. It will access invalid memory. This will cause the bug. > >This is very hard to follow. 4 lines above you wrote the release is postponed >but the way you describe is looks completely sequential. For hdac_hdmi, as it is used by other code (device_link), the device release will not be called immediately. After sdev and hdev is freed, the hdev device is put by device_link. Then hdev device can be released. But hdev has already be freed. This is wrong. > >> >> So I think we should not use devm_xxx, and let's free the hdev manually. >> >> At the end of this topic, I still found 2 suspicious code in the current code. >> 1. in sound/soc/intel/skylate/skl.c >> it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). >> As we will call kfree() in the current code, should we replace it with >> kzalloc()? Maybe we need cavs drivers owner's help on it. > >maybe you should send a diff suggestion to help everyone understand the >changes you are referring to? Please see my inline patch: diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c index f864f7b..12af99a 100644 --- a/sound/soc/intel/skylake/skl.c +++ b/sound/soc/intel/skylake/skl.c @@ -697,8 +697,7 @@ static int probe_codec(struct hdac_bus *bus, int addr) dev_dbg(bus->dev, "codec #%d probed OK: %x\n", addr, res); #if IS_ENABLED(CONFIG_SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC) - hda_codec = devm_kzalloc(&skl->pci->dev, sizeof(*hda_codec), - GFP_KERNEL); + hda_codec = kzalloc(sizeof(*hda_codec), GFP_KERNEL); if (!hda_codec) return -ENOMEM; @@ -716,7 +715,7 @@ static int probe_codec(struct hdac_bus *bus, int addr) } return 0; #else - hdev = devm_kzalloc(&skl->pci->dev, sizeof(*hdev), GFP_KERNEL); + hdev = kzalloc(sizeof(*hdev), GFP_KERNEL); if (!hdev) return -ENOMEM; > >> >> 2. in snd_hdac_ext_bus_device_remove() It will call >> snd_hdac_device_unregister() to unregister the hdac_devices and >> put_device(&codec->dev) to release the dev. >> For analog codec, snd_hdac_device_unregister() will free the >> codec->dev's kobject. And snd_hda_codec_dev_release() will be called >> to free the hdac_device. >> So it is invalid to call put_device(&codec->dev). If you print >> refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec >> before put_device(), you will find the refcount has already been 0. > >Isn't it a different problem though? Does this cause a freeze or is this just a >bad refcount? Yes, it is a totally different problem. And today morning, I found Ranjani has a patch " ASoC: hda: increment codec device refcount when it is added to the card ", which should fix this bug. Regards, Libin _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx https://mailman.alsa-project.org/mailman/listinfo/alsa-devel