>-----Original Message----- >From: Yang, Libin >Sent: Monday, June 3, 2019 5:10 PM >To: Takashi Iwai <tiwai@xxxxxxx> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>; alsa- >devel@xxxxxxxxxxxxxxxx; broonie@xxxxxxxxxx >Subject: RE: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix >memory allocation > >Hi Takashi, > >>-----Original Message----- >>From: Takashi Iwai [mailto:tiwai@xxxxxxx] >>Sent: Thursday, May 23, 2019 4:27 PM >>To: Yang, Libin <libin.yang@xxxxxxxxx> >>Cc: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>; alsa- >>devel@xxxxxxxxxxxxxxxx; broonie@xxxxxxxxxx >>Subject: Re: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: >>fix memory allocation >> >>On Thu, 23 May 2019 10:21:20 +0200, >>Yang, Libin wrote: >>> >>> Hi Takashi, >>> >>> >-----Original Message----- >>> >From: Takashi Iwai [mailto:tiwai@xxxxxxx] >>> >Sent: Thursday, May 23, 2019 4:16 PM >>> >To: Yang, Libin <libin.yang@xxxxxxxxx> >>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>; >>> >alsa- devel@xxxxxxxxxxxxxxxx; broonie@xxxxxxxxxx >>> >Subject: Re: [PATCH v2 12/12] ASoC: SOF: Intel: >>> >hda-codec: fix memory allocation >>> > >>> >On Thu, 23 May 2019 10:03:03 +0200, >>> >Yang, Libin wrote: >>> >> >>> >> Please let me describe the issue here. >>> >> >>> >> The test case is: >>> >> 1) Unload module with script "sudo ./sof_remove.sh" , >>> >> 2) reload module with script "sudo ./sof_insert.sh" >>> >> >>> >> After several rounds of removing and inserting kernel modules, >>> >> system will complain like below: >>> >> "BUG: unable to handle kernel paging request at 000000292a282031" >>> > >>> >Did you try some kernel debug options? It might show what went wrong. >>> >>> No, I haven't. I'm not sure which options I can use for this case. >>> Could you please give me some suggestions? >> >>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option >for >>showing each devres allocation and removal. And I'd try >>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever >interesting in >>CONFIG_DEBUG section, too. > >Thanks for your suggestion. After more than 1 week debug, I think maybe I >have root caused this issue from the devres.log message. > >Below is my finding. >1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called, >and it will set hdev->dev.release = default_release. >However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe() >will be called later. And it will call snd_hda_codec_device_new(), which will >reset codec->code.dev.release = snd_hda_codec_dev_release; This means >hdac_hdmi's hdev dev release is default_release() defined in hdac_ext_bus.c, >and other hda codec's hdev dev release is snd_hda_codec_dev_release(). > >Both default_release() and snd_hda_codec_dev_release() will call kfree() to >free the hdac_device (or its container) in the current code. > >2. When we run rmmod sof_pci_dev, it will free the sdev. If we use Struct >hdac_device *hdev = devm_kzalloc(sdev->dev...). This means hdev will also be >freed automatically. > >In the removal, snd_hdac_ext_bus_device_remove() will be called to remove >the hdev (in this function it is struct hdac_device *codec. >The name is not aligned in different places). >However for hdac_hdmi, the hdev->dev is used by other code. >So calling device.release() (the function default_release()) will be postponed. >After after sdev is freed, the device.release() will be called. But for devm_xxx, >hdev will also be freed when sdev is freed. This means hdev.dev after sdev is >freed is invalid now as hdev has already freed. It will access invalid memory. >This will cause the bug. > >So I think we should not use devm_xxx, and let's free the hdev manually. > >At the end of this topic, I still found 2 suspicious code in the current code. >1. in sound/soc/intel/skylate/skl.c >it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc(). >As we will call kfree() in the current code, should we replace it with kzalloc()? >Maybe we need cavs drivers owner's help on it. > >2. in snd_hdac_ext_bus_device_remove() >It will call snd_hdac_device_unregister() to unregister the hdac_devices and >put_device(&codec->dev) to release the dev. >For analog codec, snd_hdac_device_unregister() will free the codec->dev's >kobject. And snd_hda_codec_dev_release() will be called to free the >hdac_device. >So it is invalid to call put_device(&codec->dev). If you print >refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before >put_device(), you will find the refcount has already been 0. I tested on the latest code. The behavior on kernel 5.2.0-rc1 is different from it on kernel 5.0.0 of https://github.com/thesofproject/linux/. (thesofproject kernel has already been upgraded to 5.2.0-rc1 now) On kernel 5.0.0 (https://github.com/thesofproject/linux/), when removing the sof kernel modules, the sequence is: 1. devm_kazlloc_release() to release hdac_device 2. snd_hdac_ext_bus_device_exit(): this function will use hdac_device However, on kernel 5.2.0-rc1, the sequence is: 1. snd_hdac_ext_bus_device_exit() 2. devm_kazlloc_release() to release hdac_device So I think it is safe now to use devm_xxx to allocate the hdac_device. Also, Ranjani found a duplicated release of hdac_device. She has a patch to fix it. Regards, Libin > >Regards, >Libin > >> >> >>Takashi _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx https://mailman.alsa-project.org/mailman/listinfo/alsa-devel