Re: [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix memory allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Takashi,

>-----Original Message-----
>From: Takashi Iwai [mailto:tiwai@xxxxxxx]
>Sent: Thursday, May 23, 2019 4:27 PM
>To: Yang, Libin <libin.yang@xxxxxxxxx>
>Cc: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>; alsa-
>devel@xxxxxxxxxxxxxxxx; broonie@xxxxxxxxxx
>Subject: Re:  [PATCH v2 12/12] ASoC: SOF: Intel: hda-codec: fix
>memory allocation
>
>On Thu, 23 May 2019 10:21:20 +0200,
>Yang, Libin wrote:
>>
>> Hi Takashi,
>>
>> >-----Original Message-----
>> >From: Takashi Iwai [mailto:tiwai@xxxxxxx]
>> >Sent: Thursday, May 23, 2019 4:16 PM
>> >To: Yang, Libin <libin.yang@xxxxxxxxx>
>> >Cc: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>;
>> >alsa- devel@xxxxxxxxxxxxxxxx; broonie@xxxxxxxxxx
>> >Subject: Re:  [PATCH v2 12/12] ASoC: SOF: Intel:
>> >hda-codec: fix memory allocation
>> >
>> >On Thu, 23 May 2019 10:03:03 +0200,
>> >Yang, Libin wrote:
>> >>
>> >> Please let me describe the issue here.
>> >>
>> >> The test case is:
>> >> 1) Unload module with script "sudo ./sof_remove.sh" ,
>> >> 2) reload module with script "sudo ./sof_insert.sh"
>> >>
>> >> After several rounds of removing and inserting kernel modules,
>> >> system will complain like below:
>> >> "BUG: unable to handle kernel paging request at 000000292a282031"
>> >
>> >Did you try some kernel debug options?  It might show what went wrong.
>>
>> No, I haven't. I'm not sure which options I can use for this case.
>> Could you please give me some suggestions?
>
>You can enable CONFIG_DEBUG_DEVRES and adjust the devres.log option for
>showing each devres allocation and removal.  And I'd try
>CONFIG_DEBUG_SLAB and CONFIG_DEBUG_KMEMLEAK or whatever
>interesting in CONFIG_DEBUG section, too.

Thanks for your suggestion. After more than 1 week debug, I think maybe 
I have root caused this issue from the devres.log message.

Below is my finding.
1. When initialing the codecs, snd_hdac_ext_bus_device_init() will be called,
and it will set hdev->dev.release = default_release.
However, for analog codec (not hdac_hdmi codec), hdac_hda_codec_probe()
will be called later. And it will call snd_hda_codec_device_new(), which will
reset codec->code.dev.release = snd_hda_codec_dev_release;
This means hdac_hdmi's hdev dev release is default_release() defined in 
hdac_ext_bus.c, and other hda codec's hdev dev release is
snd_hda_codec_dev_release().

Both default_release() and snd_hda_codec_dev_release() will call kfree()
to free the hdac_device (or its container) in the current code.

2. When we run rmmod sof_pci_dev, it will free the sdev. If we use
Struct hdac_device *hdev = devm_kzalloc(sdev->dev...). This means 
hdev will also be freed automatically.

In the removal, snd_hdac_ext_bus_device_remove() will be called
to remove the hdev (in this function it is struct hdac_device *codec.
The name is not aligned in different places). 
However for hdac_hdmi, the hdev->dev is used by other code. 
So calling device.release() (the function default_release()) will
be postponed. After after sdev is freed, the device.release() will
be called. But for devm_xxx, hdev will also be freed when sdev is
freed. This means hdev.dev after sdev is freed is invalid now as
hdev has already freed. It will access invalid memory. This will cause the bug.

So I think we should not use devm_xxx, and let's free the hdev manually.

At the end of this topic, I still found 2 suspicious code in the current code.
1. in sound/soc/intel/skylate/skl.c
it calls hdev = devm_kazlloc() or hda_codec = devm_kzalloc().
As we will call kfree() in the current code, should we replace it with
kzalloc()? Maybe we need cavs drivers owner's help on it.

2. in snd_hdac_ext_bus_device_remove()
It will call snd_hdac_device_unregister() to unregister the hdac_devices
and put_device(&codec->dev) to release the dev.
For analog codec, snd_hdac_device_unregister()  will free the codec->dev's
kobject. And snd_hda_codec_dev_release() will be called to free the 
hdac_device.
So it is invalid to call put_device(&codec->dev). If you print 
refcound_read(&(codec->dev.kobj.kref.refcount)) for analog codec before
put_device(), you will find the refcount has already been 0.

Regards,
Libin 

>
>
>Takashi
_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
https://mailman.alsa-project.org/mailman/listinfo/alsa-devel



[Index of Archives]     [ALSA User]     [Linux Audio Users]     [Pulse Audio]     [Kernel Archive]     [Asterisk PBX]     [Photo Sharing]     [Linux Sound]     [Video 4 Linux]     [Gimp]     [Yosemite News]

  Powered by Linux