Hi, There is an ACPICA bug tracking this issue: http://bugs.acpica.org/show_bug.cgi?id=1115 We'll be back working on this after finishing other tasks. Thanks and best regards -Lv > From: Hanjun Guo [mailto:guohanjun@xxxxxxxxxx] > Sent: Wednesday, June 03, 2015 2:53 PM > > Hi Lv, > > Any updates for this issue? > > Thanks > Hanjun > > On 2014/6/18 12:42, Zheng, Lv wrote: > > Hi, > > > >> From: joeyli [mailto:jlee@xxxxxxxx] > >> Sent: Tuesday, June 17, 2014 5:19 PM > >> > >> On Tue, Jun 17, 2014 at 01:03:28AM +0000, Zheng, Lv wrote: > >>> Hi, > >>> > >>>> From: linux-acpi-owner@xxxxxxxxxxxxxxx [mailto:linux-acpi-owner@xxxxxxxxxxxxxxx] On Behalf Of joeyli > >>>> Sent: Monday, June 16, 2014 5:14 PM > >>>> > >>>> Hi Lv Zheng, > >>>> > >>>> Thanks for your response. > >>>> > >>>> On Mon, Jun 16, 2014 at 05:10:29AM +0000, Zheng, Lv wrote: > >>>>> Hi, > >>>>> > >>>>>> From: Lee, Chun-Yi [mailto:joeyli.kernel@xxxxxxxxx] > >>>>>> Sent: Monday, June 16, 2014 10:12 AM > >>>>>> > >>>>>> This issue found on v3.0 kernel, unfortunately there was no chance > >>>>>> to test latest kernel on issue mchine. This patch tested on v3.0 kernel > >>>>>> then sent to linux-acpi for review and note, maybe latest kernel also need. > >>>>>> > >>>>>> The problem happened when acpi thermal driver evaluate _PSL, but acpi > >>>>>> processor driver install notify handler at the same time. In the > >>>>>> code path of evaluate _PSL, it updates reference count of processor > >>>>>> and its notify objects. When the notify handler installation done > >>>>>> after the reference count of processor updated, it causes the > >>>>>> ref_count of processor doesn't sync with its notify object's > >>>>>> ref_count value. > >>>>>> > >>>>>> Here is an debugging log when issue reproduced: > >>>>>> > >>>>>> [ 3.481773] ACPI_TYPE_PROCESSOR set ACPI_DEVICE_NOTIFY, object_desc->common.reference_count: 3, notify_obj- > >>>>>>> common.reference_count: 1 > >>>>>> [ 3.481958] PROCESSOR device_hid: LNXCPU > >>>>>> ... > >>>>>> [ 3.487427] ACPI_TYPE_PROCESSOR, action = 1 > >>>>>> [ 3.487428] Update device_notify ref_count > >>>>>> [ 3.487429] REF_DECREMENT ACPI_TYPE_LOCAL_NOTIFY original_count: 0 > >>>>>> [ 3.487431] ACPI Warning: Obj ffff8800b0f40b28, Reference Count is already zero, cannot decrement > >>>>>> [ 3.487433] (20110413/utdelete-431) > >>>>>> [ 3.487434] REF_DECREMENT ACPI_TYPE_PROCESSOR original_count: 2 > >>>>>> > >>>>>> Accroding log, found the reference_count of parent object > >>>>>> (it's processor in this case) is 3, it doesn't match with notify_object's > >>>>>> reference_count, value is 1. It triggered "Reference Count is already zero" > >>>>>> warning, then happen object double free issue later. > >>>>>> > >>>>>> To avoid rece condition, this patch introded ACPI_MTX_NOTIFY_REF_COUNT > >>>>>> mutex to keep the ref_count of notify object sync with its parent > >>>>>> object. And, it also set the reference_count value of new notify object > >>>>>> equals to its parent object's reference_count. > >>>>>> > >>>>>> Signed-off-by: Lee, Chun-Yi <jlee@xxxxxxxx> > >>>>>> --- > >>>>>> drivers/acpi/acpica/aclocal.h | 3 ++- > >>>>>> drivers/acpi/acpica/evxface.c | 7 ++++++- > >>>>>> drivers/acpi/acpica/utdelete.c | 6 ++++++ > >>>>>> 3 files changed, 14 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/acpi/acpica/aclocal.h b/drivers/acpi/acpica/aclocal.h > >>>>>> index c7f743c..e25a4af 100644 > >>>>>> --- a/drivers/acpi/acpica/aclocal.h > >>>>>> +++ b/drivers/acpi/acpica/aclocal.h > >>>>>> @@ -85,8 +85,9 @@ union acpi_parse_object; > >>>>>> #define ACPI_MTX_MEMORY 5 /* Debug memory tracking lists */ > >>>>>> #define ACPI_MTX_DEBUG_CMD_COMPLETE 6 /* AML debugger */ > >>>>>> #define ACPI_MTX_DEBUG_CMD_READY 7 /* AML debugger */ > >>>>>> +#define ACPI_MTX_NOTIFY_REF_COUNT 8 /* Reference count of notify object */ > >>>>>> > >>>>>> -#define ACPI_MAX_MUTEX 7 > >>>>>> +#define ACPI_MAX_MUTEX 8 > >>>>>> #define ACPI_NUM_MUTEX ACPI_MAX_MUTEX+1 > >>>>>> > >>>>>> /* Lock structure for reader/writer interfaces */ > >>>>>> diff --git a/drivers/acpi/acpica/evxface.c b/drivers/acpi/acpica/evxface.c > >>>>>> index e114140..213fe1a 100644 > >>>>>> --- a/drivers/acpi/acpica/evxface.c > >>>>>> +++ b/drivers/acpi/acpica/evxface.c > >>>>>> @@ -495,6 +495,10 @@ acpi_install_notify_handler(acpi_handle device, > >>>>>> handler, context, > >>>>>> NULL); > >>>>>> > >>>>>> + acpi_ut_acquire_mutex(ACPI_MTX_NOTIFY_REF_COUNT); > >>>>>> + > >>>>>> + notify_obj->common.reference_count = obj_desc->common.reference_count; > >>>>>> + > >>>>> Should be converted to spin_lock here. > >>>>> > >>>> Did you mean using acpi_gbl_reference_count_lock ? > >>> Yes. > >>> > >>>>>> if (handler_type & ACPI_SYSTEM_NOTIFY) { > >>>>>> obj_desc->common_notify.system_notify = notify_obj; > >>>>>> } > >>>>>> @@ -503,8 +507,9 @@ acpi_install_notify_handler(acpi_handle device, > >>>>>> obj_desc->common_notify.device_notify = notify_obj; > >>>>>> } > >>>>>> > >>>>>> - if (handler_type == ACPI_ALL_NOTIFY) { > >>>>>> + acpi_ut_release_mutex(ACPI_MTX_NOTIFY_REF_COUNT); > >>>>>> > >>>>>> + if (handler_type == ACPI_ALL_NOTIFY) { > >>>>>> /* Extra ref if installed in both */ > >>>>>> > >>>>>> acpi_ut_add_reference(notify_obj); > >>>>>> diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c > >>>>>> index 31f5a78..7559813 100644 > >>>>>> --- a/drivers/acpi/acpica/utdelete.c > >>>>>> +++ b/drivers/acpi/acpica/utdelete.c > >>>>>> @@ -504,6 +504,7 @@ acpi_ut_update_object_reference(union acpi_operand_object *object, u16 action) > >>>>>> > >>>>>> /* Update the notify objects for these types (if present) */ > >>>>>> > >>>>>> + acpi_ut_acquire_mutex(ACPI_MTX_NOTIFY_REF_COUNT); > >>>> Can I acquire acpi_gbl_reference_count_lock here? Does is not cause recursive lock? > >>>> Or direct acquire remove mutex here? > >>> You can. > >>> If you just want to hold the lock longer, you can introduce a lockless API to replace the one invoked below. > >>> > >> I just tried to change my mutex code to acquire acpi_gbl_reference_count_lock in acpi_ut_update_object_reference(). > >> But when system boot, kernel pending on updating ref_count of parent object, in acpi_ut_update_ref_count() when > >> acquire acpi_gbl_reference_count_lock. > >> > >> On the other hand, > >> I didn't find lockless spinlock API in acpi subsystem, and even in acpica project git. Could you please help teach > >> me where is the lockless function? > >> > > I agree this is not a suitable case for acpi_gbl_reference_count_lock usage. > > > > I just want to say, this is an open source project, if you see any case need to extend the locking period, you can do any modification > you want. > > For example, if you find such a case: copying the reference from its original owner to a copy of the owner and try to increase it. > > You can introduce an API to ensure atomicity by holding the acpi_gbl_reference_count_lock for the 2 operations. > > > >>>>>> acpi_ut_update_ref_count(object->common_notify. > >>>>>> system_notify, action); > >>>>> If you take a look at acpi_ut_update_ref_count, there is already a spin_lock held around the code to update the reference > count. > >>>>> > >>>>> Thanks > >>>>> -Lv > >>>> hm.... I am not sure can using acpi_gbl_reference_count_lock. > >>>> > >>>> When updating parent object's ref_count, it should atomic with its notify objects's ref_count updating. > >>>> The situation I want to avoid is: > >>>> > >>>> a. acpi_ut_update_object_reference() update ref_count of system_notify/device_notify. (Thermal driver) > >>>> Assume the notify object didn't install to parent object yet. So, the ref_count doesn't increase. > >>>> > >>>> b. acpi_install_notify_handler() install system_notify/device_notify. (Processor driver) > >>>> Processor driver install notify handler of parent object in this window. The ref_count of notify object is 1. > >>>> > >>>> c. acpi_ut_update_object_reference() call acpi_ut_update_ref_count() to update ref_count of parent object. (Thermal driver) > >>>> The ref_count of parent object update to 2. > >>>> <=== this ref_count doesn't match with ref_count of notify object, it's 1. > >>>> > >>>> Due to a. and c. steps should atomic, so I add ACPI_MTX_NOTIFY_REF_COUNT to bind them together, when > >>>> the object type is processor, device, power or thermal. > >>> I can see that the only problem is the sub-object's reference count updates. > >>> It shouldn't be increased each time the parent object's reference count is increased, but should be increased only once when it is > >> linked to the parent object (referenced by the parent). > >>> That seems to be the root cause. > >> Yes, this is the main problem, either increase or decrease ref_count of parent object, the sub-objects(here is notify object) will > >> also updating either increase or decrease. > >> > >> In the case of my issue machine, when ref_count of parent object is 2 but sub-object's ref_count is 1, later decrease parent > object's > >> ref_count to 0 will causes sub-object's ref_count is -1. <=== it triggers the double free issue then slab will emit oops. > > Can I say that: > > The issue is caused by the current ACPICA locking implementation. > > For this case, we need 2 locks: > > 1. The first lock need to be used each time we are going to update the members of an object. > > 2. The second lock need to be used to mutual exclusively executing the code that can affect the object's state, ensure it is not > messed up from different call context. This should be achieved by some artificial locking facility. In many cases, the call path is locked > without holding any locks that can lead to an atomic context. > > > > Currently we may just doing wrong by utilizing the 2nd lock to meet the 1st requirement. > > > > For ACPICA, shall we first ask: > > 1. do we have a per-object lock to meet the first requirement? > > 2. do we have a global lock to protect the consistencies of the namespace? > > > > I think you should fix this by tuning the locking granularity. > > The patch can do so unless there is no other mean can be used to fix this issue. > > > > Thanks and best regards > > -Lv > > > >>> The current code might be a hackish result to work around some design defects. > >>> So what you do in this patch might be dangerous, they can break ACPICA interpreter and you need to do it in the ACPICA source > >> base. > >>> There is a recursive unit test facility - ASLTS to help determine if your change doesn't introduce regressions. > >>> I think you need to file a bug in ACPICA bugzilla and start a discussion there. > >>> > >>> Thanks and best regards > >>> -Lv > >>> > >> Thanks for your information, I will try ASLTS and file bug about the atomic between main object's ref_count > >> with sub-object's ref_count. > >> > >>>>>> acpi_ut_update_ref_count(object->common_notify. > >>>>>> @@ -592,6 +593,11 @@ acpi_ut_update_object_reference(union acpi_operand_object *object, u16 action) > >>>>>> * main object to be deleted. > >>>>>> */ > >>>>>> acpi_ut_update_ref_count(object, action); > >>>>>> + if (object->common.type == ACPI_TYPE_PROCESSOR || > >>>>>> + object->common.type == ACPI_TYPE_DEVICE || > >>>>>> + object->common.type == ACPI_TYPE_POWER || > >>>>>> + object->common.type == ACPI_TYPE_THERMAL) > >>>>>> + acpi_ut_release_mutex(ACPI_MTX_NOTIFY_REF_COUNT); > >>>> here also. I think need remove this release? > >>>> > >>>>>> object = NULL; > >>>>>> > >>>>>> /* Move on to the next object to be updated */ > >>>>>> -- > >>>>>> 1.8.4.5 > >>>> Please let me know if I miss understood. > >>>> > >>>> > >>>> Thanks a lot! > >>>> Joey Lee > >> Regards > >> Joey Lee > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > . > > > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html